Headroom Developer Guide

What this guide covers

Headroom is a context-compression layer that shrinks everything an AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the model. This guide explains what it does, the four ways to run it (library, proxy, agent-wrap, MCP), when it pays off, and how it fits together with Slim Tools and ECC in a single context-economy stack.

Source scope as of July 1, 2026

Based on the public Headroom repository (headroomlabs-ai/headroom, Apache-2.0) and its documentation site. All commands, package names, and integration snippets below are quoted from the project's own README. The performance numbers are Headroom's own reported figures — treat them as vendor claims and reproduce them on your own workload before relying on them. Package versions, CLI flags, and the agent matrix change frequently, so re-check the linked sources.

1. The short version

Headroom sits between your agent and the LLM provider and compresses the payload in flight. According to the project, this cuts 60–95% of tokens on real agent workloads while keeping answer quality steady on standard benchmarks.

Pick your entry point by how much you want to touch your code:

You want to…	Use	Code changes
Cut tokens with zero code changes, any language	Proxy (`headroom proxy`)	None
Route a coding agent (Claude Code, Cursor, Aider, …) through it	Agent wrap (`headroom wrap <tool>`)	None
Compress inside your own app	Library (`compress(messages)`)	Small
Expose compression to any MCP client	MCP server (`headroom mcp install`)	None

For most people evaluating Headroom, start with proxy or agent wrap — both are drop-in and let you measure savings before touching application code.

2. What it compresses

Headroom is content-aware. A router inspects each piece of context and picks a specialized compressor:

Tool outputs — command results, API JSON, search hits
Logs — verbose CI, SRE, and debugging output
RAG chunks — retrieved documents before they hit the prompt
Files — source code (AST-aware) and structured data
Conversation history — the growing transcript
Output tokens — optionally, what the model writes back (see Proxy and agent wrapping)

It runs locally — the project's central privacy claim is that your data stays on your machine.

3. Modes at a glance

Library   →  compress(messages) inline in Python or TypeScript
Proxy     →  headroom proxy --port 8787   (drop-in, any language)
Agent wrap → headroom wrap claude|codex|cursor|aider|…   (one command)
MCP       →  headroom_compress / headroom_retrieve / headroom_stats

Each mode is covered in Architecture and modes. Reversible compression (CCR) means originals are cached locally and the model can pull them back on demand via a retrieval tool, so compression is not lossy in practice.

4. When it pays off — and when to skip

Good fit if you:

run AI coding agents daily and want savings without rewriting anything,
work across several agents and want a shared memory/context store,
need compression to be reversible (originals retrievable within a TTL).

Skip or defer if you:

rely only on a single provider's native compaction and need no cross-agent memory,
run in a locked-down sandbox where a local proxy process cannot run.

5. Directory map

docs/ai/tools/headroom/
├── index.mdx                        (this page)
├── architecture-and-modes.mdx       how it works + the four modes in depth
├── install-and-cli.mdx              install, extras, CLI reference
├── library-and-sdk-integration.mdx  compress(), SDK wrap, framework adapters
├── proxy-and-agent-wrapping.mdx     proxy, headroom wrap, output-token reduction
├── context-economy-stack.mdx        Slim Tools + Headroom + when ECC fits
└── troubleshooting.mdx              doctor, quality checks, common blockers

6. Where this fits

Headroom solves one layer of a bigger problem — the tokens flowing through an agent. It is complementary to Slim Tools, which reduces the tool surface an agent has to reason about, and to ECC-style workflow discipline. The context-economy stack page lays out how the three combine and when to add each.

1. The short version​

2. What it compresses​

3. Modes at a glance​

4. When it pays off — and when to skip​

5. Directory map​

6. Where this fits​

7. Primary references​