LATEST MODEL

DeepSeek-V4-Pro

DeepSeek 🧠 The Thinker Released April 2026

DeepSeek's frontier MoE flagship closing the gap with leading proprietary models on reasoning and agentic coding

DeepSeek-V4-Pro

DeepSeekApril 2026

Latest

Training Data

32+ trillion tokens, up to early 2026

DeepSeek-V4-Pro

April 2026

Parameters

1.6 trillion (49B active)

Training Method

MoE with hybrid attention (CSA + HCA), Muon optimizer, two-stage post-training

Context Window

1,000,000 tokens

Knowledge Cutoff

Not disclosed

Key Features

Hybrid Compressed Attention • Manifold-Constrained Hyper-Connections • FP4/FP8 Mixed Precision • Open Weights (MIT)

Capabilities

Reasoning: Outstanding

Coding: Outstanding

Agentic: Outstanding

What's New in This Version

27% inference FLOPs and 10% KV cache vs V3.2 at 1M tokens; SWE Verified 80.6, Terminal-Bench 2.0 67.9, MMLU-Pro 87.5, GPQA 90.1, LiveCodeBench 93.5

DeepSeek's frontier MoE flagship closing the gap with leading proprietary models on reasoning and agentic coding

What's New in This Version

27% inference FLOPs and 10% KV cache vs V3.2 at 1M tokens; SWE Verified 80.6, Terminal-Bench 2.0 67.9, MMLU-Pro 87.5, GPQA 90.1, LiveCodeBench 93.5

Technical Specifications

Parameters 1.6 trillion (49B active)
Context Window 1,000,000 tokens
Training Method MoE with hybrid attention (CSA + HCA), Muon optimizer, two-stage post-training
Knowledge Cutoff Not disclosed
Training Data 32+ trillion tokens, up to early 2026

Key Features

Hybrid Compressed Attention Manifold-Constrained Hyper-Connections FP4/FP8 Mixed Precision Open Weights (MIT)

Capabilities

Reasoning: Outstanding
Coding: Outstanding
Agentic: Outstanding
Theme
Language
Support
© funclosure 2025