Zum Hauptinhalt springen

Headroom Developer Guide

What this guide covers

Headroom is a context-compression layer that shrinks everything an AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the model. This guide explains what it does, the four ways to run it (library, proxy, agent-wrap, MCP), when it pays off, and how it fits together with Slim Tools and ECC in a single context-economy stack.

Source scope as of July 1, 2026

Based on the public Headroom repository (headroomlabs-ai/headroom, Apache-2.0) and its documentation site. All commands, package names, and integration snippets below are quoted from the project's own README. The performance numbers are Headroom's own reported figures — treat them as vendor claims and reproduce them on your own workload before relying on them. Package versions, CLI flags, and the agent matrix change frequently, so re-check the linked sources.

1. The short version​

Headroom sits between your agent and the LLM provider and compresses the payload in flight. According to the project, this cuts 60–95% of tokens on real agent workloads while keeping answer quality steady on standard benchmarks.

Pick your entry point by how much you want to touch your code:

You want to…UseCode changes
Cut tokens with zero code changes, any languageProxy (headroom proxy)None
Route a coding agent (Claude Code, Cursor, Aider, …) through itAgent wrap (headroom wrap <tool>)None
Compress inside your own appLibrary (compress(messages))Small
Expose compression to any MCP clientMCP server (headroom mcp install)None

For most people evaluating Headroom, start with proxy or agent wrap — both are drop-in and let you measure savings before touching application code.

2. What it compresses​

Headroom is content-aware. A router inspects each piece of context and picks a specialized compressor:

  • Tool outputs — command results, API JSON, search hits
  • Logs — verbose CI, SRE, and debugging output
  • RAG chunks — retrieved documents before they hit the prompt
  • Files — source code (AST-aware) and structured data
  • Conversation history — the growing transcript
  • Output tokens — optionally, what the model writes back (see Proxy and agent wrapping)

It runs locally — the project's central privacy claim is that your data stays on your machine.

3. Modes at a glance​

Library   →  compress(messages) inline in Python or TypeScript
Proxy → headroom proxy --port 8787 (drop-in, any language)
Agent wrap → headroom wrap claude|codex|cursor|aider|… (one command)
MCP → headroom_compress / headroom_retrieve / headroom_stats

Each mode is covered in Architecture and modes. Reversible compression (CCR) means originals are cached locally and the model can pull them back on demand via a retrieval tool, so compression is not lossy in practice.

4. When it pays off — and when to skip​

Good fit if you:

  • run AI coding agents daily and want savings without rewriting anything,
  • work across several agents and want a shared memory/context store,
  • need compression to be reversible (originals retrievable within a TTL).

Skip or defer if you:

  • rely only on a single provider's native compaction and need no cross-agent memory,
  • run in a locked-down sandbox where a local proxy process cannot run.

5. Directory map​

docs/ai/tools/headroom/
├── index.mdx (this page)
├── architecture-and-modes.mdx how it works + the four modes in depth
├── install-and-cli.mdx install, extras, CLI reference
├── library-and-sdk-integration.mdx compress(), SDK wrap, framework adapters
├── proxy-and-agent-wrapping.mdx proxy, headroom wrap, output-token reduction
├── context-economy-stack.mdx Slim Tools + Headroom + when ECC fits
└── troubleshooting.mdx doctor, quality checks, common blockers

6. Where this fits​

Headroom solves one layer of a bigger problem — the tokens flowing through an agent. It is complementary to Slim Tools, which reduces the tool surface an agent has to reason about, and to ECC-style workflow discipline. The context-economy stack page lays out how the three combine and when to add each.

7. Primary references​