DeepSeek-V4-Pro
DeepSeek's frontier MoE flagship closing the gap with leading proprietary models on reasoning and agentic coding
DeepSeek-V4-Pro
DeepSeek • April 2026
Training Data
32+ trillion tokens, up to early 2026
DeepSeek-V4-Pro
April 2026
Parameters
1.6 trillion (49B active)
Training Method
MoE with hybrid attention (CSA + HCA), Muon optimizer, two-stage post-training
Context Window
1,000,000 tokens
Knowledge Cutoff
Not disclosed
Key Features
Hybrid Compressed Attention • Manifold-Constrained Hyper-Connections • FP4/FP8 Mixed Precision • Open Weights (MIT)
Capabilities
Reasoning: Outstanding
Coding: Outstanding
Agentic: Outstanding
What's New in This Version
27% inference FLOPs and 10% KV cache vs V3.2 at 1M tokens; SWE Verified 80.6, Terminal-Bench 2.0 67.9, MMLU-Pro 87.5, GPQA 90.1, LiveCodeBench 93.5
DeepSeek's frontier MoE flagship closing the gap with leading proprietary models on reasoning and agentic coding
What's New in This Version
27% inference FLOPs and 10% KV cache vs V3.2 at 1M tokens; SWE Verified 80.6, Terminal-Bench 2.0 67.9, MMLU-Pro 87.5, GPQA 90.1, LiveCodeBench 93.5
Technical Specifications
Key Features
Capabilities
Other DeepSeek Models
Explore more models from DeepSeek
DeepSeek-V4-Flash
DeepSeek's smaller, fast variant of V4 — same architecture at a fraction of the cost and latency
DeepSeek-V3.2
DeepSeek's latest flagship model matching GPT-5 performance with integrated tool-use thinking
DeepSeek-V3.2-Speciale
DeepSeek's competition-focused variant (EXPIRED Dec 15, 2025 - was temporary API-only release)