We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks. To support the research community, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outperforms OpenAI-o1-mini across various benchmarks, achieving new state-of-the-art results for dense models.
Model Usage
Apps Using This
Get started with code samples in your preferred language
Basic example of chat with the selected model
1curl https://api.targon.com/v1/chat/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -N \
5 -d '{
6 "model": "deepseek-ai/DeepSeek-R1",
7 "stream": true,
8 "messages": [
9 {"role": "system", "content": "You are a helpful programming assistant."},
10 {"role": "user", "content": "Write a bubble sort implementation in Python with comments explaining how it works"}
11 ],
12 "temperature": 0.7,
13 "max_tokens": 256,
14 "top_p": 0.1,
15 "frequency_penalty": 0,
16 "presence_penalty": 0
17 }'
Basic example of completions with the selected model
1curl https://api.targon.com/v1/completions \
2 -H "Content-Type: application/json" \
3 -H "Authorization: Bearer YOUR_API_KEY" \
4 -N \
5 -d '{
6 "model": "deepseek-ai/DeepSeek-R1",
7 "stream": true,
8 "prompt": "The x y problem is",
9 "temperature": 0.7,
10 "max_tokens": 256,
11 "top_p": 0.1,
12 "frequency_penalty": 0,
13 "presence_penalty": 0
14 }'