Ollama Herd Guides

Start Here

Quickstart — Install to first routed request in 60 seconds. Create a fleet with two commands, send a request, and see it land on the right machine.
Core Concepts — The mental model behind Ollama Herd. Nodes, heartbeats, scoring signals, queues, capacity modes, and how they fit together.

Routing Engine — How the 5-stage scoring pipeline eliminates bad candidates, scores survivors across 7 signals, and picks a winner for every request.
Adaptive Capacity — How your fleet learns when each device has spare compute. Weekly behavioral models, meeting detection, app fingerprinting, and memory ceilings.

Claude Code CLI — Point Claude Code CLI at your hardware. Native Anthropic Messages API, three-layer context management, per-tier model routing, tool-schema fixup. Fixes the "breaks at 30K tokens" failure mode on local Qwen3-Coder models.
Integrations — Connect Ollama Herd to Open WebUI, LangChain, CrewAI, OpenClaw, Aider, Continue.dev, LlamaIndex, and any OpenAI-compatible client.
Deployment — Multi-node setup, monitoring, log analysis, health checks, graceful drain, and production tips.
API Reference — Every endpoint with request/response schemas, headers, error codes, and curl examples.