Open WebUI is a chat interface; Ollama Herd is a routing engine. Open WebUI connects to multiple Ollama backends but picks one at random. Herd evaluates 7 signals per request to find the optimal node. They are complementary: point Open WebUI at Herd for intelligent routing behind a beautiful chat UI.
Open WebUI is the most popular self-hosted chat interface for local LLMs, with over 126K GitHub stars. It provides a polished, ChatGPT-like browser experience on top of Ollama and other backends, with features like multi-user auth, RAG document upload, conversation history, web search integration, and a plugin system. It supports connecting to multiple Ollama instances and aggregating their models.
Ollama Herd is an open-source smart multimodal AI router that turns multiple Ollama instances across Apple Silicon devices into one intelligent endpoint. It routes LLMs, embeddings, image generation, speech-to-text, and vision with a 7-signal scoring engine, mDNS auto-discovery, and an 8-tab real-time dashboard. Two commands to set up, zero config files. pip install ollama-herd or brew install ollama-herd.
"Open WebUI supports multiple Ollama instances" and "Ollama Herd routes across multiple Ollama instances" sound like the same thing. They are not.
Open WebUI's multi-backend is a connection manager. It knows that multiple backends exist and can send requests to any of them. Its selection is essentially random or round-robin — it does not evaluate which backend is best for a given request at a given moment.
Ollama Herd is a routing engine. It evaluates 7 signals in real-time (model availability, GPU memory, system memory, thermal state, queue depth, historical performance, node health) to determine the optimal node for each request. It also handles retry, fallback, queue management, and health monitoring.
| Feature | Open WebUI Multi-Backend | Ollama Herd |
|---|---|---|
| Connect to multiple Ollama instances | Yes | Yes (auto-discovery via mDNS) |
| Manual backend configuration | Yes (add URLs in settings) | Optional (zero-config with mDNS) |
| Auto-discover new nodes | No | Yes — mDNS, no config needed |
| Model aggregation | Yes | Yes |
| Intelligent request routing | No — random/round-robin | Yes — 7-signal weighted scoring |
| GPU memory awareness | No | Yes — real-time VRAM tracking |
| System memory awareness | No | Yes |
| Thermal state monitoring | No | Yes — reads macOS thermal pressure |
| Queue depth tracking | No | Yes — backpressure when nodes are overloaded |
| Adaptive capacity learning | No | Yes — learns node performance over time |
| Meeting detection | No | Yes — reduces load on machines in active calls |
| Auto-retry on failure | No — shows error to user | Yes — transparent retry on next-best node |
| Model fallbacks | No | Yes — falls back to compatible alternatives |
| Multimodal routing | Partial — image gen via backends | Full — LLMs, embeddings, image gen, STT |
| Embedding routing | No — single backend | Yes — scored routing across fleet |
| Health monitoring | Basic connection check | 17 health checks with detailed status |
| Request tracing | No | Yes — distributed tracing across nodes |
| Performance dashboard | No (has usage stats) | Yes — 8-tab dashboard |
| Smart benchmarking | No | Yes — calibrates scoring per node |
| Dynamic context optimization | No | Yes — adjusts context window per node capability |
| OpenAI API compatibility | Yes (as a consumer) | Yes (as a provider) |
| Ollama API compatibility | Yes (as a consumer) | Yes (as a provider) |
Open WebUI does things that Herd does not do and has no plans to do:
Open WebUI and Ollama Herd are not competitors. They are complementary.
Open WebUI is a frontend — the best way for humans to interact with local LLMs through a browser. Ollama Herd is a backend — the best way to route AI requests across a fleet of Apple Silicon machines.
The ideal setup:
[Open WebUI] --> [Ollama Herd :11435] --> [Mac Mini 1 :11434]
(chat UI) (smart routing) [Mac Mini 2 :11434]
[MacBook Pro :11434]
[Mac Studio :11434]
Configure Open WebUI to point at Herd's endpoint (http://herd-host:11435) as its Ollama backend. Open WebUI sees it as a single Ollama instance, but behind that endpoint, Herd is scoring nodes, managing queues, handling retries, and routing to the optimal machine.
You get Open WebUI's excellent chat experience and Herd's intelligent routing. No compromise.
Without Herd: Open WebUI connects to multiple backends directly, picks one at random, and hopes for the best. If a node is thermal throttling or overloaded, Open WebUI doesn't know and can't adapt.
With Herd: Open WebUI connects to one endpoint. Every request is intelligently routed. Failed requests are retried transparently. The user never sees a backend error when a healthy node is available.
Open WebUI is a chat interface that happens to support multiple backends. Ollama Herd is a routing engine that happens to work with chat interfaces.
They solve different problems at different layers of the stack. You don't choose one or the other — you use both.
The strongest configuration is Open WebUI as the frontend for human users, Herd as the backend for intelligent fleet routing, and your agent frameworks hitting Herd's API directly. Every request gets scored, queued, retried, and traced — whether it comes from a browser, a Python script, or an autonomous agent.
pip install ollama-herd # or: brew install ollama-herd
herd # start the router
herd-node # on each device
You don't switch from Open WebUI — you upgrade its backend. Open WebUI is the chat interface, Herd is the routing layer behind it.
pip install ollama-herd on your router machine, then herd to start.herd-node on each device with Ollama.http://localhost:11434 to http://router-ip:11435.That's it. Open WebUI now gets intelligent routing instead of random backend selection. Every chat, RAG query, and image generation request goes to the best available machine.
Yes, but with random or round-robin selection — not intelligent routing. Open WebUI knows that multiple backends exist and can send requests to any of them, but it does not evaluate which backend is best for a given request based on thermal state, memory pressure, queue depth, or historical performance.
Yes, and this is the recommended setup. Point Open WebUI at Herd's endpoint (http://herd-host:11435) as its Ollama backend. Open WebUI sees a single Ollama instance, but behind it Herd scores, queues, retries, and routes every request to the optimal node.
No. Herd is a backend routing service. If you want a browser-based chat experience, use Open WebUI as your frontend with Herd as the backend. If you are building agent applications, hit Herd's API directly.
Install Herd first, then configure Open WebUI to connect to Herd instead of directly to Ollama. This way, every request from Open WebUI benefits from intelligent routing from the start.
Herd routes the embedding requests that RAG generates. Open WebUI handles the document pipeline and sends embedding requests to its backend. When that backend is Herd, those embedding requests get routed to the best node in your fleet for embedding workloads.