Troubleshooting and Verification
Diagnostics, environment variables, and error strings below are quoted from the Headroom README. For anything not covered here, see the Limitations and Configuration docs.
1. Confirm it is working​
headroom doctor # health check — confirms routing is working
headroom perf # performance / savings summary
headroom dashboard # live savings (proxy must be running)
If doctor reports routing is not active, the most common causes are: the proxy is not running, the client is not pointed at the proxy port, or you installed the npm package (SDK only — no CLI). See Install and CLI.
2. Verify quality did not regress​
Compression is only useful if answers stay correct. Headroom's own position is that quality holds on standard benchmarks — the README reports GSM8K ±0.000 and TruthfulQA +0.030 on 100-item runs, with SQuAD v2 and BFCL around 97%. These are the project's figures; reproduce them on your own workload:
python -m headroom.evals suite --tier 1
Two safety nets are built in:
- Reversible compression (CCR) — if the model needs detail that was compressed away, it retrieves the original via the retrieval tool. Nothing is permanently lost within the configured TTL.
- Output holdout —
export HEADROOM_OUTPUT_HOLDOUT=0.1keeps 10% of conversations unshaped as a control group, so you can compare shaped vs. unshaped instead of trusting an estimate.
If you suspect a specific compressed field caused a wrong answer, that is the case to reproduce and report upstream — do not assume the compression is faithful just because tokens dropped.
3. Common blockers​
| Symptom | Likely cause | Fix |
|---|---|---|
headroom: command not found | Installed via npm (SDK only) | Install the PyPI package: pip install "headroom-ai[all]" |
CERTIFICATE_VERIFY_FAILED on install | SSL inspection; Rust build fetches rustup over an untrusted link | Install Rust first, or pip install --only-binary headroom-ai headroom-ai |
Basic Constraints of CA cert not marked critical | Python 3.13 strict TLS rejects a non-critical corporate CA | HEADROOM_TLS_STRICT=0 headroom proxy --port 8787 |
Model download blocked (huggingface.co) | Corporate firewall | Pre-download and set HF_HUB_OFFLINE=1, or set HF_ENDPOINT to a mirror |
ONNX runtime blocked (cdn.pyke.io) | Corporate firewall | ORT_STRATEGY=system + ORT_LIB_LOCATION=/path/to/onnxruntime |
Dashboard shows $0.00 savings | Installed on Python 3.14+ (LiteLLM can't install there) | Reinstall on 3.13: pipx reinstall headroom-ai --python python3.13 |
| Output shaping has no effect | HEADROOM_OUTPUT_SHAPER set after wrap | Set it before headroom wrap; it is off by default |
HEADROOM_TLS_STRICT=0 clears only the RFC 5280 strict flag from the TLS contexts Headroom controls. Chain validation, signature, expiry, and hostname checks all stay on — it is strictly narrower than disabling verification. Running as a pure gateway with compression disabled needs neither the ONNX nor the HuggingFace asset.
4. Privacy and deployment posture​
Headroom runs locally and its stated model is that your data stays on your machine — this is the main reason it is a lighter trust decision than a hosted compression API. The OSS build is aimed at individual developers on a laptop.
Before rolling it across a team or into shared infrastructure, the README frames that as a different job (shared always-on deployment, centralized config, SSO, air-gapped/VPC installs). Treat an org-wide rollout as an infrastructure project, not a pip install, and confirm data-handling expectations for your environment first.
5. When to skip Headroom​
Per the README, Headroom is a poor fit if you:
- rely only on a single provider's native compaction and need no cross-agent memory, or
- run in a sandboxed environment where a local proxy process cannot run.