Guide

Adaptive Capacity

How your fleet learns when each device has spare compute — and adjusts routing automatically.

The Problem

Your laptop isn't a server. It has an owner who uses it for meetings, coding, video editing, and browsing. Routing inference to a machine during a video call makes both the call and the inference terrible.

The adaptive capacity system watches usage patterns over time, builds a behavioral model, and tells the router when each device has spare capacity — without reading application names or invading privacy.

Three Components

ComponentWhat It DoesPlatform
Capacity LearnerBuilds a weekly behavioral model of each deviceAll platforms
Meeting DetectorDetects active cameras/microphones for hard-pausemacOS only
App FingerprinterClassifies workload intensity from system metricsAll platforms

Capacity Learner

The Weekly Model

The learner maintains 168 slots — one for each hour of the week (Monday 00:00 through Sunday 23:00). Each slot accumulates observations of CPU and memory usage, weighted with exponential decay so recent behavior matters more.

Monday 9am:   65% CPU, 78% memory  --> low availability
Monday 2am:    3% CPU, 45% memory  --> high availability
Saturday 3pm: 25% CPU, 55% memory  --> moderate availability

Availability Score

After a 7-day bootstrap period, the learner combines three signals into a real-time score (0.0 to 1.0):

SignalWeightWhat
Historical baseline40%What you usually do at this hour of the week
Current observed state40%What's happening right now
CPU trend20%Is activity rising or falling over the last 5 minutes

Score to Memory Ceiling

The availability score maps to how much memory the router can use on this device:

ScoreModeMemory CeilingWhat It Means
0.80–1.00Full80% of total RAMFull fleet participant
0.60–0.80Learned high50% (max 64GB)Normal priority
0.40–0.60Learned medium25% (max 32GB)Small models only
0.20–0.40Learned low12.5% (max 16GB)Minimal, lightweight only
0.00–0.20Paused0GBNo routing

The router reads ceiling_gb from each heartbeat and uses it for scoring (memory fit) and elimination (can the model fit?).

Bootstrap Period

The first 7 days are observation-only:

Exponential Decay

Observations use a 15-day half-life. An observation from 15 days ago has half the weight of today's. This means:

Persistence

State saves to disk every ~5 minutes at ~/.fleet-manager/capacity-learner-{node-id}.json. On restart, learned state is restored. The bootstrap countdown continues from the first-ever observation.

Meeting Detection (macOS)

When a camera or microphone is active, the node is hard-paused — availability score drops to 0.0, memory ceiling drops to 0GB, no requests route to it.

How It Detects

Camera (three methods, tried in order):

  1. macOS unified logs for CoreMediaIO extension events
  2. lsof check for open handles in /Library/CoreMediaIO/
  3. Process check for VDCAssistant or AppleCameraAssistant

Microphone (two methods):

  1. ioreg query for IOAudioEngine active state
  2. lsof scan for audio input device handles

The node resumes automatically when the meeting ends. On Linux and Windows, meeting detection returns false gracefully — no pause, no error.

Why It Matters

A video call uses sustained CPU, memory, and network bandwidth. Running a 40GB model during a Zoom call degrades both. Hard-pausing on meeting detection is the difference between a system you leave running and one you constantly babysit.

App Fingerprinting

Privacy-First Design

The fingerprinter never reads application names, window titles, or user content. It observes only system-level resource consumption:

MetricSource
CPU percentpsutil.cpu_percent()
Memory percentpsutil.virtual_memory().percent
Network bytes (delta)psutil.net_io_counters()
Disk I/O (delta)psutil.disk_io_counters()

Snapshots are kept in a 2-minute sliding window (24 samples at 5-second intervals).

Workload Classification

WorkloadSignature
IdleCPU <10%, memory <70%
LightCPU 10–35% or memory >70%
ModerateCPU 35–60%
HeavyCPU 60–85%
IntensiveCPU >85%, or CPU >60% with >500KB/s sustained network

The "intensive with high network" pattern catches video calls specifically — they have a distinctive signature of sustained CPU plus high bidirectional network traffic.

CPU Trend

The fingerprinter computes a trend (-1.0 to +1.0) by comparing the first and second half of recent snapshots:

This trend feeds into the capacity learner as the 20%-weighted trend signal.

How It Fits Together

Every 5 seconds (heartbeat interval):

1. App Fingerprinter collects snapshot     -- CPU, mem, net, disk
2. Meeting Detector checks camera + mic    -- boolean
3. Capacity Learner computes availability  -- 0.0 to 1.0
   |-- If meeting detected         --> hard pause (0GB ceiling)
   |-- If sustained high CPU       --> reduced ceiling (16GB)
   |-- If bootstrapping            --> no capacity contributed
   |-- Otherwise                   --> learned score + ceiling
4. Capacity info included in heartbeat     -- sent to router
5. Router's scoring engine uses ceiling    -- respects dynamic capacity

The router never sends requests that would exceed the memory ceiling. If a MacBook's ceiling drops from 64GB to 0GB because a meeting started, the router immediately stops routing there and rebalances pending requests.

Enabling Capacity Learning

Capacity learning is opt-in. Dedicated servers should leave it disabled — they always run at full capacity.

# On a laptop that's also used for daily work
FLEET_NODE_ENABLE_CAPACITY_LEARNING=true herd-node
VariableDefaultDescription
FLEET_NODE_ENABLE_CAPACITY_LEARNINGfalseEnable the adaptive system
FLEET_NODE_DATA_DIR~/.fleet-managerWhere learner state is persisted

Next Steps