Yes. Open-source, MIT license, with no paid tiers, no API keys, and no subscriptions.

Ollama Herd vs Docker Model Runner: Fleet Routing vs Container Inference [2026]

Q: Can I use Ollama Herd with Docker Model Runner?

Yes. Docker Model Runner exposes an OpenAI-compatible API, and Herd can route to it alongside other Ollama nodes through a single unified API.

Q: How does Docker Model Runner compare to Ollama Herd for teams?

Docker Model Runner is single-machine. Ollama Herd turns the entire team's devices into a shared AI fleet with intelligent routing, pooling all collective unified memory.

Q: Does Ollama Herd require Docker Desktop?

No. Herd installs via pip or Homebrew with no Docker Desktop subscription, container runtime, or virtualization overhead required.

What is Docker Model Runner?

Docker Model Runner is Docker's native LLM inference feature, introduced in Docker Desktop in early 2026. It extends Docker's container model to include AI model serving as a first-class primitive, letting developers pull and run models with familiar docker model CLI commands. It supports GPU passthrough on NVIDIA and Apple Silicon, integrates with docker-compose workflows, and exposes an OpenAI-compatible API for local development.

What is Ollama Herd?

Ollama Herd is an open-source smart multimodal AI router that turns multiple Ollama instances across Apple Silicon devices into one intelligent endpoint. It routes LLMs, embeddings, image generation, speech-to-text, and vision with a 7-signal scoring engine, mDNS auto-discovery, and an 8-tab real-time dashboard. Two commands to set up, zero config files. pip install ollama-herd or brew install ollama-herd.

Feature Comparison

Feature	Docker Model Runner	Ollama Herd
Core approach	Docker-native model serving	Fleet request routing (7-signal scoring)
Primary use case	Run models inside the Docker workflow	Route requests across a device fleet
Scope	Single machine	Multi-machine fleet
Model types	LLMs (text generation)	LLMs, embeddings, image gen, STT
Key innovation	Models-as-containers UX	7-signal adaptive routing with capacity learning
API compatibility	OpenAI-compatible	OpenAI + Ollama dual API
Device discovery	N/A (single machine)	mDNS auto-discovery
Load balancing	None	7-signal scoring across fleet
Health monitoring	Docker health checks	30+ health checks, 7-signal scoring
Dashboard	Docker Desktop UI	8-tab dashboard (fleet, models, routing, benchmarks)
Benchmarking	None	Built-in smart benchmark with statistical analysis
Context optimization	None	Dynamic context window optimization
Capacity learning	None	Learns per-model, per-node performance over time
Thermal awareness	None	Detects thermal throttling, adjusts routing
Meeting detection	None	Reduces load on machines running video calls
Container integration	Native (docker-compose, Docker networking)	None (standalone service)
Setup	Docker Desktop + enable Model Runner	`pip install ollama-herd` on one machine
Cross-platform	macOS, Windows, Linux (via Docker Desktop)	macOS (Apple Silicon focused)
Tests	Docker's internal testing	1000+ tests, 30+ health checks
License	Proprietary (Docker Desktop)	MIT

Where Docker Model Runner Wins

Docker ecosystem integration. If your development workflow is Docker-native — docker-compose for local dev, Docker for CI/CD, containers in production — Model Runner fits perfectly. Models become just another service in your compose file alongside your database and API server.
Familiar tooling. docker model pull, docker model run — the CLI patterns are instantly recognizable to the millions of developers who already know Docker. Zero learning curve for the serving layer.
Container isolation. Models run within Docker's isolation model. Clean separation between the model runtime and your host system. Easy to tear down, rebuild, or version-pin.
Cross-platform via Docker Desktop. Runs on macOS, Windows, and Linux — anywhere Docker Desktop runs. Herd is Apple Silicon focused. If your team has a mix of Macs and Linux workstations, Docker Model Runner works everywhere.
Compose-native networking. Application containers can reach the model server via Docker's internal DNS — no port mapping, no host networking configuration. Clean service-to-service communication.
DevOps alignment. For organizations that think in containers, Model Runner means AI inference doesn't require a separate operations model. Same monitoring, same deployment patterns, same team.

Where Ollama Herd Wins

Multi-device fleet routing. Docker Model Runner runs on one machine. Herd coordinates a fleet — 3, 5, 8 machines — routing each request to the best available node. This is the fundamental difference.
Intelligent 7-signal routing. VRAM pressure, queue depth, historical latency, model affinity, context fit, thermal state, and learned capacity — Herd considers all of this for every request. Docker Model Runner is first-come, first-served on one machine.
Multimodal routing. Four model types with type-aware routing. Docker Model Runner currently handles LLM text generation. Herd routes embeddings, image generation, and speech-to-text with the same intelligence.
No Docker required. Herd is a lightweight Python package — pip install ollama-herd or brew install ollama-herd. No Docker Desktop subscription, no container runtime overhead, no daemon running in the background.
Capacity learning. Herd learns how each node performs with each model over time and adjusts routing accordingly. No manual tuning, no configuration — it gets smarter the more you use it.
Real-time fleet dashboard. 8-tab dashboard showing fleet health, model distribution, routing decisions, and benchmark results. Docker Desktop shows per-container metrics, but nothing fleet-aware or AI-specific.
Apple Silicon native performance. Herd talks directly to Ollama, which uses Metal/MLX for inference. No container layer between the model and the GPU. On Apple Silicon, this means no overhead from Docker's virtualization layer.
Thermal and meeting awareness. When a MacBook is thermally throttled or running a Zoom call, Herd routes traffic elsewhere. Docker Model Runner has no concept of the host machine's real-world state.
Smart benchmarking. Built-in benchmark with statistical analysis across your fleet. Know exactly how each node performs for each model before routing decisions are made.

The Core Difference

Docker Model Runner extends the container paradigm to model serving — it's a single-machine tool that makes LLMs fit the Docker workflow. Ollama Herd is a fleet orchestration layer that makes multiple machines act as one intelligent AI system.

Docker Model Runner answers: "How do I serve a model alongside my containers on this machine?"
Ollama Herd answers: "Which of my machines should handle this request right now?"

Docker Model Runner is about developer workflow integration. Herd is about distributed fleet intelligence.

A Note on Maturity

Docker Model Runner is early. Docker is a company that iterates quickly and has massive distribution — Docker Desktop is on millions of developer machines. Features like multi-model management, batching, and better GPU utilization are likely coming.

What Docker is unlikely to build: multi-machine fleet routing. Docker's mental model is containers on a host (or orchestrated via Kubernetes/Swarm). Cross-machine AI routing with adaptive scoring is a different problem space entirely. Even Docker Swarm mode doesn't do intelligent workload-aware routing — it does round-robin or simple load balancing.

When to Choose Each

Scenario	Choose
Docker-native dev workflow, single machine	Docker Model Runner
Team with multiple Macs sharing AI workloads	Ollama Herd
Need models in docker-compose alongside other services	Docker Model Runner
Need multimodal routing (LLM + embeddings + image gen + STT)	Ollama Herd
Mixed OS environment (Mac + Windows + Linux)	Docker Model Runner
Apple Silicon fleet, want zero-config discovery	Ollama Herd
CI/CD pipeline needs a local model for testing	Docker Model Runner
Need intelligent load balancing across devices	Ollama Herd
Want container isolation for model serving	Docker Model Runner
Want fleet-wide operational dashboard	Ollama Herd

Bottom Line

Docker Model Runner is Docker doing what Docker does — making infrastructure accessible through familiar tooling. For a single developer who already lives in Docker, it's a natural way to add local model serving. Pull a model, add it to your compose file, call it from your app. Simple, clean, Docker-native.

Ollama Herd solves a different problem: making a fleet of Apple Silicon devices work together as one AI system. It doesn't care about containers or compose files. It cares about which of your 6 Macs has the most VRAM headroom, the lowest queue depth, and the best historical performance for the model you need right now.

The typical Docker Model Runner user is a developer who wants a local LLM in their Docker dev environment. The typical Herd user is a team that wants their collective Mac hardware to serve as a shared AI platform. Docker Model Runner is a serving tool. Herd is an orchestration layer. They might coexist — a Docker Model Runner instance could expose an OpenAI-compatible endpoint that Herd routes to — but they're solving fundamentally different problems.

Getting Started

pip install ollama-herd    # or: brew install ollama-herd
herd                       # start the router
herd-node                  # on each device

Using Docker Model Runner for local dev? Herd complements it — Docker Model Runner serves models on one machine, Herd routes across your whole team's machines. Try Herd on two Macs to see what fleet routing adds.

Frequently Asked Questions

Is Ollama Herd a good alternative to Docker Model Runner?

They solve different problems. Docker Model Runner integrates LLM serving into your Docker workflow on a single machine. Ollama Herd routes AI requests across multiple devices with intelligent scoring. If you want models alongside your containers in docker-compose, use Docker Model Runner. If you want your team's Macs working together as one AI system, use Herd.

Can I use Ollama Herd with Docker Model Runner?

Yes, in principle. Docker Model Runner exposes an OpenAI-compatible API, and Herd can route to OpenAI-compatible endpoints. You could run Docker Model Runner on one machine and Ollama on others, with Herd routing across all of them through a single unified API.

How does Docker Model Runner compare to Ollama Herd for teams?

Docker Model Runner is a single-machine tool — each developer runs models on their own laptop. Ollama Herd turns the entire team's devices into a shared AI fleet with intelligent routing, so everyone benefits from the collective hardware. A team of five with MacBooks has 120–480GB of unified memory available through Herd.

Does Ollama Herd require Docker Desktop?

No. Herd installs via pip or Homebrew and runs as a lightweight Python service. No Docker Desktop subscription, no container runtime, no virtualization overhead. On Apple Silicon, this means the model talks directly to Metal/MLX with no container layer in between.

Is Ollama Herd free?

Yes. Open-source, MIT license. No paid tiers, no API keys, no subscriptions.

Ollama Herd vs Docker Model Runner