Proxy and Agent Wrapping

Source scope as of July 1, 2026

The agent matrix, flags, and environment variables below are quoted from the Headroom README. Agent coverage changes often — check the proxy docs for the current list.

These are the two zero-code-change ways to run Headroom. Both are the recommended starting point for evaluating savings.

1. Proxy mode

headroom proxy --port 8787

This starts a local, OpenAI-compatible proxy. Point any OpenAI-compatible client at it and every request is compressed before it goes upstream — no application changes, any language. It is also the surface that reports live savings via headroom dashboard.

2. Agent wrapping

headroom wrap <tool> wires a supported coding agent through Headroom in one command; headroom unwrap <tool> reverses durable wrapping.

headroom wrap claude
headroom unwrap claude

Agent compatibility matrix

Agent	`headroom wrap`	Notes
Claude Code	✅	flags: `--memory` · `--code-graph` · `--1m` · `--tool-search`
Codex	✅	shares memory with Claude
Cursor	Manual setup	starts proxy and prints base URLs for Cursor settings
Aider	✅	starts proxy + launches
Copilot CLI	✅	starts proxy + launches
OpenClaw	✅	installs as a ContextEngine plugin
OpenCode	✅	injects config · starts proxy + launches
Cline	✅	starts proxy + injects config
Continue	✅	starts proxy + injects config
Goose	✅	starts proxy + launches
OpenHands	✅	starts proxy + launches
Mistral Vibe	✅	starts proxy + launches
Cortex Code	Library only	60–65% savings (library mode; no `wrap`)

Any OpenAI-compatible client works via headroom proxy. MCP-native clients use headroom mcp install. Durable wrapping can be undone for claude, copilot, codex, opencode, and openclaw.

3. Output-token reduction

Everything above shrinks the prompt you send. Headroom can also trim what the model writes back — preambles, restated code, and deep "thinking" on routine steps — which matters because output tokens are billed at a premium on large models.

It is off by default. Turn it on at the proxy:

export HEADROOM_OUTPUT_SHAPER=1
headroom proxy --port 8787

Two mechanisms, per the README:

Verbosity steering — appends a short "be terse, don't restate context" note to the end of the system prompt, so your prompt cache still hits.
Effort routing — when a turn is just the model resuming after a tool result, it dials thinking effort down; new questions and errors keep full effort.

Set the switches before you wrap

These are read live per request. Set HEADROOM_OUTPUT_SHAPER before headroom wrap, which hot-syncs current settings to the running proxy via a loopback admin call (no restart). On a shared proxy the overrides are global — the last explicit setting wins.

Learn your preferred terseness

headroom learn --verbosity            # preview what it found (dry run)
headroom learn --verbosity --apply    # save it; the proxy uses it from now on

Measure the output savings honestly

Output savings are counterfactual — Headroom never sees what the model would have written — so it reports an estimate with a confidence range rather than a made-up number:

headroom output-savings
# Reduction: 31.7%  (95% CI 27.7% … 35.7%)   [estimated]

For a measured number, hold out a control group: export HEADROOM_OUTPUT_HOLDOUT=0.1 leaves 10% of conversations unshaped, and the dashboard labels the result measured or estimated.

4. GitHub Copilot subscription mode (brief)

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

For GitHub Enterprise Server or custom-domain deployments, set GITHUB_COPILOT_ENTERPRISE_DOMAIN before launching. The README notes that non-macOS keychain reuse paths are implemented or planned but not fully vetted, so for Docker/CI prefer passing an explicit GITHUB_COPILOT_TOKEN. See the README for the full flow.

1. Proxy mode​

2. Agent wrapping​

Agent compatibility matrix​

3. Output-token reduction​

Learn your preferred terseness​

Measure the output savings honestly​

4. GitHub Copilot subscription mode (brief)​

5. References​