From zero to routed inference in 60 seconds.
ollama pull llama3.2:3b)On the machine you want as the router (typically your most powerful device):
pip install ollama-herd
herd
The router starts on port 11435. You'll see:
Ollama Herd ready on port 11435
On each device running Ollama (including the router machine if it also runs Ollama):
pip install ollama-herd
herd-node
The node discovers the router automatically via mDNS:
Discovered router at 10.0.0.100:11435
Heartbeat sent: 2 models loaded, 128GB available
Can't use mDNS? Connect directly:
herd-node --router-url http://10.0.0.100:11435
Check that nodes are online:
curl -s http://localhost:11435/fleet/status | python3 -m json.tool
You should see your nodes listed with their models, memory, and status.
Or open the dashboard in your browser:
http://localhost:11435/dashboard
OpenAI format:
curl http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": "Hello from the fleet!"}],
"stream": false
}'
Ollama format:
curl http://localhost:11435/api/chat -d '{
"model": "llama3.2:3b",
"messages": [{"role": "user", "content": "Hello from the fleet!"}],
"stream": false
}'
The router scores all available nodes and routes the request to the best one. Check the response headers to see which node handled it:
curl -v http://localhost:11435/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Which node am I on?"}], "stream": false}' \
2>&1 | grep X-Fleet
< X-Fleet-Node: mac-studio-ultra
< X-Fleet-Score: 85
Point any OpenAI-compatible tool at the router — no code changes needed:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11435/v1", api_key="not-needed")
response = client.chat.completions.create(
model="llama3.2:3b",
messages=[{"role": "user", "content": "Hello!"}],
stream=True,
)
for chunk in response:
print(chunk.choices[0].delta.content or "", end="")
Replace localhost with the router's LAN IP if connecting from another machine.
http://localhost:11435/dashboard to see your fleet in real timepip install --upgrade ollama-herd
Restart the router and node agents after upgrading. See CHANGELOG for what's new.