DeepSeek V4 Flash

DeepSeek · DeepSeek V4

DeepSeek's low-cost V4 API route for 1M-context production assistants, agents, and compatibility migrations.

Part of DeepSeek V4 family · Other versions: DeepSeek V4 Pro
Type
language
Context
1M tokens
Max Output
384K tokens
Status
current
Input
$0.14/1M tok
Output
$0.28/1M tok
API Access
Yes
License
DeepSeek License
reasoning long-context cost-efficient agentic open-weights thinking-mode
Released April 2026 · Updated May 16, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

DeepSeek V4 Flash is the lower-cost route in the DeepSeek V4 API family. It is the model most teams should evaluate first when they want DeepSeek’s newer 1M-context stack, compatibility with old DeepSeek API names, and low token pricing without jumping directly to the Pro tier.

The key operational detail is migration. DeepSeek’s official changelog says the legacy deepseek-chat and deepseek-reasoner names currently point to V4 Flash compatibility modes and are scheduled for discontinuation on July 24, 2026. New integrations should call deepseek-v4-flash explicitly.

Capabilities

V4 Flash supports both thinking and non-thinking modes, so it can cover cheap conversational assistants, long-context summarization, document analysis, and many agentic coding or workflow tasks. DeepSeek positions it as close to V4 Pro on simpler agent tasks while being smaller, faster, and more economical.

It is especially interesting for teams that need long context at low cost. A 1M-token window changes retrieval and document-processing economics, but the usual caveat still applies: long context does not replace good indexing, prompt boundaries, or evaluation.

Technical Details

Official anchors at this snapshot:

  • Model ID: deepseek-v4-flash.
  • 1,000,000 token context length.
  • 384,000 maximum output tokens.
  • Thinking and non-thinking modes.
  • OpenAI Chat Completions and Anthropic-compatible API formats.
  • JSON output, tool calls, chat prefix completion, and FIM completion in non-thinking mode.

Pricing & Access

DeepSeek lists V4 Flash API pricing per 1M tokens:

  • Cache-hit input: $0.0028
  • Cache-miss input: $0.14
  • Output: $0.28

Access is through DeepSeek’s API using the unchanged base URL, plus open-weight downloads from DeepSeek’s Hugging Face organization. Third-party gateways may expose different pricing, context limits, or safety controls.

Best Use Cases

Use DeepSeek V4 Flash for budget-sensitive agents, compatibility migrations away from deepseek-chat and deepseek-reasoner, long-context document workflows, and high-throughput internal assistants where V4 Pro is overkill.

It is less natural when the hardest reasoning quality matters more than cost, or when enterprise procurement, data-residency contracts, and support guarantees dominate the model decision.

Comparisons

  • DeepSeek V4 Pro: Stronger V4 route for harder reasoning; Flash is cheaper and better as the default throughput tier.
  • DeepSeek-Reasoner: Deprecated compatibility name that maps to V4 Flash thinking mode until the July 2026 discontinuation.
  • Gemini 2.5 Flash: Stronger Google ecosystem and multimodal tool integration; V4 Flash is more aggressive on long-context token economics.