Headroom Developer Guide
Headroom is a context-compression layer that shrinks everything an AI agent reads β tool outputs, logs, RAG chunks, files, and conversation history β before it reaches the model. This guide explains what it does, the four ways to run it (library, proxy, agent-wrap, MCP), when it pays off, and how it fits together with Slim Tools and ECC in a single context-economy stack.
Based on the public Headroom repository (headroomlabs-ai/headroom, Apache-2.0) and its documentation site. All commands, package names, and integration snippets below are quoted from the project's own README. The performance numbers are Headroom's own reported figures β treat them as vendor claims and reproduce them on your own workload before relying on them. Package versions, CLI flags, and the agent matrix change frequently, so re-check the linked sources.
1. The short versionβ
Headroom sits between your agent and the LLM provider and compresses the payload in flight. According to the project, this cuts 60β95% of tokens on real agent workloads while keeping answer quality steady on standard benchmarks.
Pick your entry point by how much you want to touch your code:
| You want to⦠| Use | Code changes |
|---|---|---|
| Cut tokens with zero code changes, any language | Proxy (headroom proxy) | None |
| Route a coding agent (Claude Code, Cursor, Aider, β¦) through it | Agent wrap (headroom wrap <tool>) | None |
| Compress inside your own app | Library (compress(messages)) | Small |
| Expose compression to any MCP client | MCP server (headroom mcp install) | None |
For most people evaluating Headroom, start with proxy or agent wrap β both are drop-in and let you measure savings before touching application code.
2. What it compressesβ
Headroom is content-aware. A router inspects each piece of context and picks a specialized compressor:
- Tool outputs β command results, API JSON, search hits
- Logs β verbose CI, SRE, and debugging output
- RAG chunks β retrieved documents before they hit the prompt
- Files β source code (AST-aware) and structured data
- Conversation history β the growing transcript
- Output tokens β optionally, what the model writes back (see Proxy and agent wrapping)
It runs locally β the project's central privacy claim is that your data stays on your machine.
3. Modes at a glanceβ
Library β compress(messages) inline in Python or TypeScript
Proxy β headroom proxy --port 8787 (drop-in, any language)
Agent wrap β headroom wrap claude|codex|cursor|aider|β¦ (one command)
MCP β headroom_compress / headroom_retrieve / headroom_stats
Each mode is covered in Architecture and modes. Reversible compression (CCR) means originals are cached locally and the model can pull them back on demand via a retrieval tool, so compression is not lossy in practice.
4. When it pays off β and when to skipβ
Good fit if you:
- run AI coding agents daily and want savings without rewriting anything,
- work across several agents and want a shared memory/context store,
- need compression to be reversible (originals retrievable within a TTL).
Skip or defer if you:
- rely only on a single provider's native compaction and need no cross-agent memory,
- run in a locked-down sandbox where a local proxy process cannot run.
5. Directory mapβ
docs/ai/tools/headroom/
βββ index.mdx (this page)
βββ architecture-and-modes.mdx how it works + the four modes in depth
βββ install-and-cli.mdx install, extras, CLI reference
βββ library-and-sdk-integration.mdx compress(), SDK wrap, framework adapters
βββ proxy-and-agent-wrapping.mdx proxy, headroom wrap, output-token reduction
βββ context-economy-stack.mdx Slim Tools + Headroom + when ECC fits
βββ troubleshooting.mdx doctor, quality checks, common blockers
6. Where this fitsβ
Headroom solves one layer of a bigger problem β the tokens flowing through an agent. It is complementary to Slim Tools, which reduces the tool surface an agent has to reason about, and to ECC-style workflow discipline. The context-economy stack page lays out how the three combine and when to add each.