DeepSeek Guide

What is this about?

DeepSeek is one consumer app, one official (China-hosted) API, a set of MIT-licensed open weights, and a broad third-party hosting ecosystem that runs those same weights elsewhere. This guide maps those surfaces, shows the OpenAI-compatible quickstart, and is explicit about the data-residency trade-offs that matter for a regulated EU context.

Source scope as of June 23, 2026

Based on official DeepSeek sources (api-docs.deepseek.com, huggingface.co/deepseek-ai, the DeepSeek privacy policy). The current generation is DeepSeek-V4 (released April 24, 2026). The legacy API model IDs deepseek-chat and deepseek-reasoner still resolve but are scheduled to retire on 2026-07-24 — migrate to the V4 IDs. Pricing and model specs change; re-check the live pricing page before quoting figures.

1. The mental model

Surface	What it is for	Primary user
DeepSeek app (`chat.deepseek.com`)	Free consumer chat with a DeepThink toggle (off = fast, on = reasoning), web search, and file upload	End users
DeepSeek API (`platform.deepseek.com`)	Pay-as-you-go developer API; OpenAI- and Anthropic-compatible	Developers
Open weights (Hugging Face, MIT license)	Download to self-host or fine-tune	Self-hosters, researchers, compliance-driven teams
Third-party routing (OpenRouter, Together, Fireworks, DeepInfra, Bedrock, Azure)	Run the open weights on non-China (US/EU) infrastructure	Teams that cannot send data to China, or want one API across models

2. Model lineup (current, 2026)

The current generation is DeepSeek-V4 (released April 24, 2026), a hybrid model — reasoning ("thinking") and fast ("non-thinking") behavior are controlled by a mode, not by switching models. This unifies what used to be split between a chat model and the standalone R1 reasoning model.

Model	Total / active params	Context	Max output	License
DeepSeek-V4-Pro	1.6T / 49B active (MoE)	1M tokens	384K tokens	MIT
DeepSeek-V4-Flash	284B / 13B active (MoE)	1M tokens	384K tokens	MIT

The V4-Pro card describes three reasoning modes — Non-Think, Think High, Think Max.

A few precisions

Multimodality lives in separate models (DeepSeek-OCR / OCR-2, Janus-Pro, DeepSeek-VL2). Treat the deepseek-v4-* text API as text-in / text-out.
DeepSeek-Coder is no longer a separate hosted model — coding capability is folded into the general models. The older DeepSeek-Coder-V2 open weights remain downloadable but are legacy.
The V4-Flash parameter count is reported as 284B total / 13B active on the official release note; some listings showed conflicting figures — confirm on the live model card.

3. The API

The DeepSeek API is OpenAI-compatible (and Anthropic-compatible), so it is a drop-in for existing codebases — you change base_url and the key:

OpenAI format: https://api.deepseek.com (also https://api.deepseek.com/v1)
Anthropic format: https://api.deepseek.com/anthropic

Current model identifiers: deepseek-v4-flash and deepseek-v4-pro. The legacy deepseek-chat and deepseek-reasoner still work but retire 2026-07-24 15:59 UTC (they currently map to the non-thinking and thinking modes of V4-Flash). Prompt caching is automatic and makes cached input roughly 50× cheaper than uncached.

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{"model": "deepseek-v4-pro", "messages": [{"role": "user", "content": "Hello!"}]}'

4. Data residency and compliance

Where the data goes matters

The official API and consumer app are operated from mainland China (per DeepSeek's privacy policy). For a regulated EU B2B context (DSGVO, and customers who are professional-secrecy holders), the hosted API should be treated as not suitable for sensitive or personal data without prior legal review. As precedent, Italy's data-protection authority ordered a block on processing Italian users' data in January 2025.

Two ways to keep data out of China:

Self-host the MIT-licensed open weights on your own EU/US infrastructure (e.g. vLLM) — prompts and outputs never leave your environment.
Route via a US/EU-hosted third party — OpenRouter, Together, Fireworks, DeepInfra, AWS Bedrock, or Azure AI Foundry. Expect higher per-token cost than the official API in exchange for residency and SLAs.

Content behavior (factual, third-party): independent red-team studies report that DeepSeek models decline or steer answers on topics that are politically sensitive in China. Because this is applied at the model level (fine-tuning), it can persist even in self-hosted open weights — it is not only an app-layer filter. These are third-party findings, not official DeepSeek statements.

5. Pricing (official, snapshot)

Snapshot — verify at the source

Per 1M tokens, USD, from the official pricing page. Third-party providers set their own (usually higher) prices. Off-peak discounts are not currently offered.

Model	Input (cache hit)	Input (cache miss)	Output
`deepseek-v4-flash`	$0.0028	$0.14	$0.28
`deepseek-v4-pro`	$0.003625	$0.435	$0.87

The standout lever is the cache-hit input rate (~50× cheaper) — structure long, stable system prompts or RAG context to hit the cache.

6. Decision guide

Situation	Choose…
cost is the priority, data is non-sensitive, China residency is acceptable	the official DeepSeek API (cheapest, full feature set, OpenAI drop-in)
data must stay out of China; you want redundancy / SLAs / a unified multi-model API	third-party routing (Together / Fireworks / DeepInfra / OpenRouter / Bedrock / Azure)
strict compliance (DSGVO, etc.) or zero per-token cost at scale, and you have GPUs	self-host the MIT open weights (note: V4-Pro is a 1.6T-param MoE — real infra and ops)

7. Official links

Related guides

1. The mental model​

2. Model lineup (current, 2026)​

3. The API​

4. Data residency and compliance​

5. Pricing (official, snapshot)​

6. Decision guide​

7. Official links​