Skip to main content

DeepSeek Guide

What is this about?

DeepSeek is one consumer app, one official (China-hosted) API, a set of MIT-licensed open weights, and a broad third-party hosting ecosystem that runs those same weights elsewhere. This guide maps those surfaces, shows the OpenAI-compatible quickstart, and is explicit about the data-residency trade-offs that matter for a regulated EU context.

Source scope as of June 23, 2026

Based on official DeepSeek sources (api-docs.deepseek.com, huggingface.co/deepseek-ai, the DeepSeek privacy policy). The current generation is DeepSeek-V4 (released April 24, 2026). The legacy API model IDs deepseek-chat and deepseek-reasoner still resolve but are scheduled to retire on 2026-07-24 β€” migrate to the V4 IDs. Pricing and model specs change; re-check the live pricing page before quoting figures.

1. The mental model​

SurfaceWhat it is forPrimary user
DeepSeek app (chat.deepseek.com)Free consumer chat with a DeepThink toggle (off = fast, on = reasoning), web search, and file uploadEnd users
DeepSeek API (platform.deepseek.com)Pay-as-you-go developer API; OpenAI- and Anthropic-compatibleDevelopers
Open weights (Hugging Face, MIT license)Download to self-host or fine-tuneSelf-hosters, researchers, compliance-driven teams
Third-party routing (OpenRouter, Together, Fireworks, DeepInfra, Bedrock, Azure)Run the open weights on non-China (US/EU) infrastructureTeams that cannot send data to China, or want one API across models

2. Model lineup (current, 2026)​

The current generation is DeepSeek-V4 (released April 24, 2026), a hybrid model β€” reasoning ("thinking") and fast ("non-thinking") behavior are controlled by a mode, not by switching models. This unifies what used to be split between a chat model and the standalone R1 reasoning model.

ModelTotal / active paramsContextMax outputLicense
DeepSeek-V4-Pro1.6T / 49B active (MoE)1M tokens384K tokensMIT
DeepSeek-V4-Flash284B / 13B active (MoE)1M tokens384K tokensMIT

The V4-Pro card describes three reasoning modes β€” Non-Think, Think High, Think Max.

A few precisions
  • Multimodality lives in separate models (DeepSeek-OCR / OCR-2, Janus-Pro, DeepSeek-VL2). Treat the deepseek-v4-* text API as text-in / text-out.
  • DeepSeek-Coder is no longer a separate hosted model β€” coding capability is folded into the general models. The older DeepSeek-Coder-V2 open weights remain downloadable but are legacy.
  • The V4-Flash parameter count is reported as 284B total / 13B active on the official release note; some listings showed conflicting figures β€” confirm on the live model card.

3. The API​

The DeepSeek API is OpenAI-compatible (and Anthropic-compatible), so it is a drop-in for existing codebases β€” you change base_url and the key:

  • OpenAI format: https://api.deepseek.com (also https://api.deepseek.com/v1)
  • Anthropic format: https://api.deepseek.com/anthropic

Current model identifiers: deepseek-v4-flash and deepseek-v4-pro. The legacy deepseek-chat and deepseek-reasoner still work but retire 2026-07-24 15:59 UTC (they currently map to the non-thinking and thinking modes of V4-Flash). Prompt caching is automatic and makes cached input roughly 50Γ— cheaper than uncached.

curl https://api.deepseek.com/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
-d '{"model": "deepseek-v4-pro", "messages": [{"role": "user", "content": "Hello!"}]}'

4. Data residency and compliance​

Where the data goes matters

The official API and consumer app are operated from mainland China (per DeepSeek's privacy policy). For a regulated EU B2B context (DSGVO, and customers who are professional-secrecy holders), the hosted API should be treated as not suitable for sensitive or personal data without prior legal review. As precedent, Italy's data-protection authority ordered a block on processing Italian users' data in January 2025.

Two ways to keep data out of China:

  1. Self-host the MIT-licensed open weights on your own EU/US infrastructure (e.g. vLLM) β€” prompts and outputs never leave your environment.
  2. Route via a US/EU-hosted third party β€” OpenRouter, Together, Fireworks, DeepInfra, AWS Bedrock, or Azure AI Foundry. Expect higher per-token cost than the official API in exchange for residency and SLAs.

Content behavior (factual, third-party): independent red-team studies report that DeepSeek models decline or steer answers on topics that are politically sensitive in China. Because this is applied at the model level (fine-tuning), it can persist even in self-hosted open weights β€” it is not only an app-layer filter. These are third-party findings, not official DeepSeek statements.


5. Pricing (official, snapshot)​

Snapshot β€” verify at the source

Per 1M tokens, USD, from the official pricing page. Third-party providers set their own (usually higher) prices. Off-peak discounts are not currently offered.

ModelInput (cache hit)Input (cache miss)Output
deepseek-v4-flash$0.0028$0.14$0.28
deepseek-v4-pro$0.003625$0.435$0.87

The standout lever is the cache-hit input rate (~50Γ— cheaper) β€” structure long, stable system prompts or RAG context to hit the cache.


6. Decision guide​

SituationChoose…
cost is the priority, data is non-sensitive, China residency is acceptablethe official DeepSeek API (cheapest, full feature set, OpenAI drop-in)
data must stay out of China; you want redundancy / SLAs / a unified multi-model APIthird-party routing (Together / Fireworks / DeepInfra / OpenRouter / Bedrock / Azure)
strict compliance (DSGVO, etc.) or zero per-token cost at scale, and you have GPUsself-host the MIT open weights (note: V4-Pro is a 1.6T-param MoE β€” real infra and ops)

Related guides