Docker Model Runner puts LLMs in your container workflow. Herd puts your entire Mac fleet to work. Docker Model Runner is a single-machine serving tool for the Docker-native developer. Herd is a multi-machine fleet router for teams with multiple Macs.
Docker Model Runner is Docker's native LLM inference feature, introduced in Docker Desktop in early 2026. It extends Docker's container model to include AI model serving as a first-class primitive, letting developers pull and run models with familiar docker model CLI commands. It supports GPU passthrough on NVIDIA and Apple Silicon, integrates with docker-compose workflows, and exposes an OpenAI-compatible API for local development.
Ollama Herd is an open-source smart multimodal AI router that turns multiple Ollama instances across Apple Silicon devices into one intelligent endpoint. It routes LLMs, embeddings, image generation, speech-to-text, and vision with a 7-signal scoring engine, mDNS auto-discovery, and an 8-tab real-time dashboard. Two commands to set up, zero config files. pip install ollama-herd or brew install ollama-herd.
| Feature | Docker Model Runner | Ollama Herd |
|---|---|---|
| Core approach | Docker-native model serving | Fleet request routing (7-signal scoring) |
| Primary use case | Run models inside the Docker workflow | Route requests across a device fleet |
| Scope | Single machine | Multi-machine fleet |
| Model types | LLMs (text generation) | LLMs, embeddings, image gen, STT |
| Key innovation | Models-as-containers UX | 7-signal adaptive routing with capacity learning |
| API compatibility | OpenAI-compatible | OpenAI + Ollama dual API |
| Device discovery | N/A (single machine) | mDNS auto-discovery |
| Load balancing | None | 7-signal scoring across fleet |
| Health monitoring | Docker health checks | 17 health checks, 7-signal scoring |
| Dashboard | Docker Desktop UI | 8-tab dashboard (fleet, models, routing, benchmarks) |
| Benchmarking | None | Built-in smart benchmark with statistical analysis |
| Context optimization | None | Dynamic context window optimization |
| Capacity learning | None | Learns per-model, per-node performance over time |
| Thermal awareness | None | Detects thermal throttling, adjusts routing |
| Meeting detection | None | Reduces load on machines running video calls |
| Container integration | Native (docker-compose, Docker networking) | None (standalone service) |
| Setup | Docker Desktop + enable Model Runner | pip install ollama-herd on one machine |
| Cross-platform | macOS, Windows, Linux (via Docker Desktop) | macOS (Apple Silicon focused) |
| Tests | Docker's internal testing | 480+ tests, 17 health checks |
| License | Proprietary (Docker Desktop) | MIT |
docker model pull, docker model run — the CLI patterns are instantly recognizable to the millions of developers who already know Docker. Zero learning curve for the serving layer.pip install ollama-herd or brew install ollama-herd. No Docker Desktop subscription, no container runtime overhead, no daemon running in the background.Docker Model Runner extends the container paradigm to model serving — it's a single-machine tool that makes LLMs fit the Docker workflow. Ollama Herd is a fleet orchestration layer that makes multiple machines act as one intelligent AI system.
Docker Model Runner is about developer workflow integration. Herd is about distributed fleet intelligence.
Docker Model Runner is early. Docker is a company that iterates quickly and has massive distribution — Docker Desktop is on millions of developer machines. Features like multi-model management, batching, and better GPU utilization are likely coming.
What Docker is unlikely to build: multi-machine fleet routing. Docker's mental model is containers on a host (or orchestrated via Kubernetes/Swarm). Cross-machine AI routing with adaptive scoring is a different problem space entirely. Even Docker Swarm mode doesn't do intelligent workload-aware routing — it does round-robin or simple load balancing.
| Scenario | Choose |
|---|---|
| Docker-native dev workflow, single machine | Docker Model Runner |
| Team with multiple Macs sharing AI workloads | Ollama Herd |
| Need models in docker-compose alongside other services | Docker Model Runner |
| Need multimodal routing (LLM + embeddings + image gen + STT) | Ollama Herd |
| Mixed OS environment (Mac + Windows + Linux) | Docker Model Runner |
| Apple Silicon fleet, want zero-config discovery | Ollama Herd |
| CI/CD pipeline needs a local model for testing | Docker Model Runner |
| Need intelligent load balancing across devices | Ollama Herd |
| Want container isolation for model serving | Docker Model Runner |
| Want fleet-wide operational dashboard | Ollama Herd |
Docker Model Runner is Docker doing what Docker does — making infrastructure accessible through familiar tooling. For a single developer who already lives in Docker, it's a natural way to add local model serving. Pull a model, add it to your compose file, call it from your app. Simple, clean, Docker-native.
Ollama Herd solves a different problem: making a fleet of Apple Silicon devices work together as one AI system. It doesn't care about containers or compose files. It cares about which of your 6 Macs has the most VRAM headroom, the lowest queue depth, and the best historical performance for the model you need right now.
The typical Docker Model Runner user is a developer who wants a local LLM in their Docker dev environment. The typical Herd user is a team that wants their collective Mac hardware to serve as a shared AI platform. Docker Model Runner is a serving tool. Herd is an orchestration layer. They might coexist — a Docker Model Runner instance could expose an OpenAI-compatible endpoint that Herd routes to — but they're solving fundamentally different problems.
pip install ollama-herd # or: brew install ollama-herd
herd # start the router
herd-node # on each device
Using Docker Model Runner for local dev? Herd complements it — Docker Model Runner serves models on one machine, Herd routes across your whole team's machines. Try Herd on two Macs to see what fleet routing adds.
They solve different problems. Docker Model Runner integrates LLM serving into your Docker workflow on a single machine. Ollama Herd routes AI requests across multiple devices with intelligent scoring. If you want models alongside your containers in docker-compose, use Docker Model Runner. If you want your team's Macs working together as one AI system, use Herd.
Yes, in principle. Docker Model Runner exposes an OpenAI-compatible API, and Herd can route to OpenAI-compatible endpoints. You could run Docker Model Runner on one machine and Ollama on others, with Herd routing across all of them through a single unified API.
Docker Model Runner is a single-machine tool — each developer runs models on their own laptop. Ollama Herd turns the entire team's devices into a shared AI fleet with intelligent routing, so everyone benefits from the collective hardware. A team of five with MacBooks has 120–480GB of unified memory available through Herd.
No. Herd installs via pip or Homebrew and runs as a lightweight Python service. No Docker Desktop subscription, no container runtime, no virtualization overhead. On Apple Silicon, this means the model talks directly to Metal/MLX with no container layer in between.
Yes. Open-source, MIT license. No paid tiers, no API keys, no subscriptions.