Proxy and Agent Wrapping
The agent matrix, flags, and environment variables below are quoted from the Headroom README. Agent coverage changes often — check the proxy docs for the current list.
These are the two zero-code-change ways to run Headroom. Both are the recommended starting point for evaluating savings.
1. Proxy mode​
headroom proxy --port 8787
This starts a local, OpenAI-compatible proxy. Point any OpenAI-compatible client at it and every request is compressed before it goes upstream — no application changes, any language. It is also the surface that reports live savings via headroom dashboard.
2. Agent wrapping​
headroom wrap <tool> wires a supported coding agent through Headroom in one command; headroom unwrap <tool> reverses durable wrapping.
headroom wrap claude
headroom unwrap claude
Agent compatibility matrix​
| Agent | headroom wrap | Notes |
|---|---|---|
| Claude Code | ✅ | flags: --memory · --code-graph · --1m · --tool-search |
| Codex | ✅ | shares memory with Claude |
| Cursor | Manual setup | starts proxy and prints base URLs for Cursor settings |
| Aider | ✅ | starts proxy + launches |
| Copilot CLI | ✅ | starts proxy + launches |
| OpenClaw | ✅ | installs as a ContextEngine plugin |
| OpenCode | ✅ | injects config · starts proxy + launches |
| Cline | ✅ | starts proxy + injects config |
| Continue | ✅ | starts proxy + injects config |
| Goose | ✅ | starts proxy + launches |
| OpenHands | ✅ | starts proxy + launches |
| Mistral Vibe | ✅ | starts proxy + launches |
| Cortex Code | Library only | 60–65% savings (library mode; no wrap) |
Any OpenAI-compatible client works via headroom proxy. MCP-native clients use headroom mcp install. Durable wrapping can be undone for claude, copilot, codex, opencode, and openclaw.
3. Output-token reduction​
Everything above shrinks the prompt you send. Headroom can also trim what the model writes back — preambles, restated code, and deep "thinking" on routine steps — which matters because output tokens are billed at a premium on large models.
It is off by default. Turn it on at the proxy:
export HEADROOM_OUTPUT_SHAPER=1
headroom proxy --port 8787
Two mechanisms, per the README:
- Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt, so your prompt cache still hits.
- Effort routing — when a turn is just the model resuming after a tool result, it dials thinking effort down; new questions and errors keep full effort.
These are read live per request. Set HEADROOM_OUTPUT_SHAPER before headroom wrap, which hot-syncs current settings to the running proxy via a loopback admin call (no restart). On a shared proxy the overrides are global — the last explicit setting wins.
Learn your preferred terseness​
headroom learn --verbosity # preview what it found (dry run)
headroom learn --verbosity --apply # save it; the proxy uses it from now on
Measure the output savings honestly​
Output savings are counterfactual — Headroom never sees what the model would have written — so it reports an estimate with a confidence range rather than a made-up number:
headroom output-savings
# Reduction: 31.7% (95% CI 27.7% … 35.7%) [estimated]
For a measured number, hold out a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1 leaves 10% of conversations unshaped, and the dashboard labels the result measured or estimated.
4. GitHub Copilot subscription mode (brief)​
Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:
headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o
For GitHub Enterprise Server or custom-domain deployments, set GITHUB_COPILOT_ENTERPRISE_DOMAIN before launching. The README notes that non-macOS keychain reuse paths are implemented or planned but not fully vetted, so for Docker/CI prefer passing an explicit GITHUB_COPILOT_TOKEN. See the README for the full flow.