Proxy and Agent Wrapping
The agent matrix, flags, and environment variables below are quoted from the Headroom README. Agent coverage changes often โ check the proxy docs for the current list.
These are the two zero-code-change ways to run Headroom. Both are the recommended starting point for evaluating savings.
1. Proxy modeโ
headroom proxy --port 8787
This starts a local, OpenAI-compatible proxy. Point any OpenAI-compatible client at it and every request is compressed before it goes upstream โ no application changes, any language. It is also the surface that reports live savings via headroom dashboard.
2. Agent wrappingโ
headroom wrap <tool> wires a supported coding agent through Headroom in one command; headroom unwrap <tool> reverses durable wrapping.
headroom wrap claude
headroom unwrap claude
Agent compatibility matrixโ
| Agent | headroom wrap | Notes |
|---|---|---|
| Claude Code | โ | flags: --memory ยท --code-graph ยท --1m ยท --tool-search |
| Codex | โ | shares memory with Claude |
| Cursor | Manual setup | starts proxy and prints base URLs for Cursor settings |
| Aider | โ | starts proxy + launches |
| Copilot CLI | โ | starts proxy + launches |
| OpenClaw | โ | installs as a ContextEngine plugin |
| OpenCode | โ | injects config ยท starts proxy + launches |
| Cline | โ | starts proxy + injects config |
| Continue | โ | starts proxy + injects config |
| Goose | โ | starts proxy + launches |
| OpenHands | โ | starts proxy + launches |
| Mistral Vibe | โ | starts proxy + launches |
| Cortex Code | Library only | 60โ65% savings (library mode; no wrap) |
Any OpenAI-compatible client works via headroom proxy. MCP-native clients use headroom mcp install. Durable wrapping can be undone for claude, copilot, codex, opencode, and openclaw.
3. Output-token reductionโ
Everything above shrinks the prompt you send. Headroom can also trim what the model writes back โ preambles, restated code, and deep "thinking" on routine steps โ which matters because output tokens are billed at a premium on large models.
It is off by default. Turn it on at the proxy:
export HEADROOM_OUTPUT_SHAPER=1
headroom proxy --port 8787
Two mechanisms, per the README:
- Verbosity steering โ appends a short "be terse, don't restate context" note to the end of the system prompt, so your prompt cache still hits.
- Effort routing โ when a turn is just the model resuming after a tool result, it dials thinking effort down; new questions and errors keep full effort.
These are read live per request. Set HEADROOM_OUTPUT_SHAPER before headroom wrap, which hot-syncs current settings to the running proxy via a loopback admin call (no restart). On a shared proxy the overrides are global โ the last explicit setting wins.
Learn your preferred tersenessโ
headroom learn --verbosity # preview what it found (dry run)
headroom learn --verbosity --apply # save it; the proxy uses it from now on
Measure the output savings honestlyโ
Output savings are counterfactual โ Headroom never sees what the model would have written โ so it reports an estimate with a confidence range rather than a made-up number:
headroom output-savings
# Reduction: 31.7% (95% CI 27.7% โฆ 35.7%) [estimated]
For a measured number, hold out a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1 leaves 10% of conversations unshaped, and the dashboard labels the result measured or estimated.
4. GitHub Copilot subscription mode (brief)โ
Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:
headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o
For GitHub Enterprise Server or custom-domain deployments, set GITHUB_COPILOT_ENTERPRISE_DOMAIN before launching. The README notes that non-macOS keychain reuse paths are implemented or planned but not fully vetted, so for Docker/CI prefer passing an explicit GITHUB_COPILOT_TOKEN. See the README for the full flow.