DeepSeek Coder Guide
This guide is the model-centric companion to the Cursor + DeepSeek + VS Code Guide. That guide is about the editor setup. This one is about the model itself: what DeepSeek Coder is, the three realistic ways to host it, and how to use it well once it is running.
The facts in section 1 are taken from the official DeepSeek Coder project page at deepseekcoder.github.io, checked on June 22, 2026. That page documents the original DeepSeek Coder release. Where this guide goes beyond the page (hosting commands, usage advice), I mark it as practice or inference, and I avoid quoting numbers for newer models I cannot verify from the linked source.
1. What DeepSeek Coder isβ
DeepSeek Coder is a family of open-weight code models published by DeepSeek. According to the official project page, the release includes:
1.1 Variantsβ
- Parameter sizes: 1.3B, 5.7B, 6.7B, and 33B
- Two flavors of each:
- Base β pretrained, best for raw completion and fill-in-the-middle
- Instruct β instruction-tuned, best for chat-style "do this for me" prompts
1.2 Training and capabilitiesβ
- Trained from scratch on 2 trillion tokens
- Composition: 87% code, 13% natural language (English and Chinese)
- Coverage of 80+ programming languages
- 16K token context window, which the page frames as enabling project-level code completion and infilling (not just single-function snippets)
1.3 Benchmarks (as stated on the page)β
The page positions the 33B model against CodeLlama-34B:
| Benchmark | DeepSeek-Coder-33B vs CodeLlama-34B |
|---|---|
| HumanEval (Python) | +7.9% |
| HumanEval (Multilingual) | +9.3% |
| MBPP | +10.8% |
| DS-1000 | +5.9% |
It also states that DeepSeek-Coder-Instruct-33B surpasses GPT-3.5-turbo on HumanEval. Treat these as the vendor's own published figures, not as independent third-party results.
1.4 License and accessβ
- Described on the page as open source and free for research and commercial use
- Weights: huggingface.co/deepseek-ai
- Code and exact usage formats: github.com/deepseek-ai/DeepSeek-Coder
- Hosted chat: coder.deepseek.com
"Free for commercial use" is what the project page states, but model licenses can carry conditions and can change between releases. Read the license file shipped with the specific weights you download before you ship anything on top of it.
The linked page covers the original DeepSeek Coder. DeepSeek has since published newer code-capable models (e.g. a DeepSeek-Coder-V2 line and general V-series chat/reasoner models). I am deliberately not quoting their parameter counts or benchmarks here, because the linked source does not contain them. For anything newer, check the DeepSeek-Coder repository and the deepseek-ai Hugging Face org directly.
2. The three ways to host itβ
There are exactly three realistic modes. Pick by how much control and scale you need.
| Mode | What it is | Best when |
|---|---|---|
| Local runtime (Ollama / LM Studio) | One-machine inference, minimal setup | Single developer, laptop/workstation, "just works" |
| Self-hosted server (vLLM / TGI) | OpenAI-compatible API on your GPU box | A team, larger models, many tools sharing one backend |
| DeepSeek API | DeepSeek's own hosted, token-billed API | No hardware, fastest start, accept the data leaving your network |
The rest of this section walks each one. For the editor wiring (Continue in VS Code, Cursor's free tier) see the Cursor + DeepSeek + VS Code Guide β this guide does not repeat it.
2.1 Local runtime with Ollama (easiest)β
The fastest path for one machine. Install Ollama, then pull a DeepSeek Coder model:
ollama run deepseek-coder:6.7b
The Ollama library exposes these DeepSeek Coder tags:
deepseek-coder:1.3bdeepseek-coder:6.7bdeepseek-coder:33b
Ollama serves an HTTP endpoint on http://localhost:11434 and also exposes an OpenAI-compatible path at http://localhost:11434/v1, so most tools and SDKs can talk to it without custom code.
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-coder:6.7b",
"messages": [{"role": "user", "content": "Write a Python function that reverses a linked list."}]
}'
If you get JSON back with a code block in it, your local model is live.
2.2 Self-hosted server with vLLM (scales)β
When one laptop is not enough β bigger models, a shared team backend, or routing many tools to one endpoint β use vLLM, which exposes an OpenAI-compatible server:
vllm serve deepseek-ai/deepseek-coder-6.7b-instruct
That gives you an endpoint like http://your-server:8000/v1 that behaves like the OpenAI API. Any OpenAI-compatible client β VS Code extensions, Codex-style and Claude Code-style tools, your own apps β can point at it.
The 33B model needs serious GPU memory (think a high-VRAM card or multi-GPU, especially at higher precision). The 6.7B model is the realistic "runs on a single decent GPU" choice. Exact VRAM depends on quantization and context length β benchmark on your own hardware rather than trusting a table.
2.3 DeepSeek's hosted API (no hardware)β
If you do not want to run anything, DeepSeek offers a paid, OpenAI-compatible API. This is the least private option β your prompts and code leave your network β so weigh it against your data-handling requirements before sending anything sensitive.
3. Which variant should you actually run?β
Two independent choices: size and base vs instruct.
3.1 Pick a sizeβ
| Size | Use it for |
|---|---|
| 1.3B | Smoke tests, weak laptops, autocomplete-only, very low latency |
| 6.7B | The realistic local baseline β good quality, modest hardware |
| 33B | Highest quality, needs strong GPU hardware or a server |
If you are unsure, start at 6.7B and only move up if quality is the bottleneck, or down if latency is.
3.2 Base vs Instruct β this mattersβ
This is the part people get wrong.
- Use
-instructwhen you talk to the model in chat: "refactor this", "explain this stack trace", "write tests for this function". This is what you want for an editor chat panel or an agent. - Use
-basewhen you want raw completion or fill-in-the-middle with no conversational wrapper β inline autocomplete, code infilling, or your own custom prompting harness.
A common strong setup is two models at once: a small base model for fast inline completion, and a larger instruct model for the chat/agent panel.
4. Getting the best resultsβ
4.1 Use fill-in-the-middle (FIM) for completionβ
The official page explicitly calls out infilling and project-level completion as core capabilities β the model was trained to fill a gap given code on both sides of the cursor, not just to continue from the end. This is exactly what makes it strong for inline autocomplete.
FIM only works if you wrap the prefix/suffix/hole in DeepSeek Coder's specific special tokens. The exact token strings are defined in the official repository β copy them from there rather than from memory, because getting the delimiters wrong silently degrades output. Most editor extensions (e.g. Continue) handle this for you when you select the right template; you mainly need to care if you build your own completion harness.
4.2 Feed it real contextβ
The 16K window is enough for multi-file context, which is where DeepSeek Coder is meant to shine. Practical wins:
- Paste the actual surrounding file(s), not just the one function.
- Include relevant type definitions, interfaces, and call sites so it matches your codebase instead of inventing APIs.
- For "write tests for X", give it both X and an existing test so it copies your testing conventions.
4.3 Prompt the instruct model like a seniorβ
- Be explicit about language, framework version, and constraints ("PHP 8.3, Laravel 11, no Eloquent raw queries").
- Ask for one thing per turn; long multi-task prompts get muddier output than a focused sequence.
- When it drifts from your conventions, correct with a concrete example rather than restating the rule abstractly.
4.4 Tune the samplingβ
- For code, keep temperature low (β0β0.3). High temperature buys you creative prose and buggy code.
- Set a stop sequence if you are completing inside a larger structure, so it does not run past the boundary you care about.
4.5 Always review the outputβ
DeepSeek Coder will produce confident, plausible, and sometimes wrong code β like every code model. The 87%-code training mix makes it fluent, not infallible. Treat every suggestion as a draft to review, especially around security-sensitive paths, edge cases, and anything touching real data.
5. Recommended setupsβ
5.1 Single developer, cheap and localβ
Ollama + deepseek-coder:6.7b (instruct for chat)
+ a small base model for inline completion
No token bills, fully local, good daily quality.
5.2 Team, shared backendβ
vLLM serving a DeepSeek Coder model on a GPU box
β OpenAI-compatible endpoint
β every developer's editor + internal tools point at it
One model to maintain, consistent behavior across the team, code never leaves your infrastructure.
5.3 Zero infrastructureβ
DeepSeek hosted API
Fastest to start; accept that prompts and code leave your network.
6. Bottom lineβ
- What it is: an open-weight code-model family (1.3Bβ33B, base + instruct), 2T-token training, 16K context, strong published code benchmarks, open and free for commercial use per the project page.
- How to host it: Ollama for one machine, vLLM for a team server, the DeepSeek API if you want no hardware.
- How to use it well: match base-vs-instruct to the task, lean on fill-in-the-middle for completion, feed it real multi-file context, keep temperature low, and review everything.
For the editor side of this story β Cursor's free tier, Continue in VS Code, and the self-hosting trade-offs β go to the Cursor + DeepSeek + VS Code Guide.