DeepSeek V4

Family

DeepSeek · DeepSeek V4

DeepSeek's V4 family spans Pro and Flash routes for million-token reasoning, coding, and low-cost agents.

reasoning long-context cost-efficient open-weights agentic model-family
Updated May 16, 2026

Overview

Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on May 16, 2026.

DeepSeek V4 is the V-series successor that DeepSeek began rolling out on April 24, 2026. It is the line that replaces older deepseek-chat and deepseek-reasoner aliases, both of which DeepSeek has scheduled for discontinuation on July 24, 2026. V4 is sold as DeepSeek’s main long-context production family rather than a research preview, and the API exposes two distinct variants under it: DeepSeek V4 Pro for harder reasoning workloads and DeepSeek V4 Flash for cheaper, throughput-oriented routes.

This entry covers the V4 generation as a whole. Reach for it when the product question is “should we adopt the new DeepSeek V4 line?” rather than detailed differences between Pro and Flash.

Capabilities

DeepSeek’s public materials emphasize a few characteristics that distinguish V4 from the earlier R1 line:

  • Native 1M-token context window across both Pro and Flash variants, designed for long-context retrieval and analysis without separate “long-context” SKUs.
  • Thinking mode is on by default, with a non-thinking mode available for latency-sensitive routes.
  • Strong reasoning, math, and coding behavior, with V4-Pro positioned as competitive against current proprietary frontier reference points on standard benchmarks.
  • Continued open-weight availability, retaining DeepSeek’s pattern of pairing a hosted API with releases on Hugging Face for self-hosted deployment.

Operationally, V4 sits alongside the DeepSeek-R1 family rather than on top of it. Teams already running R1 for reasoning-specific workloads can keep that route while migrating chat and assistant traffic to V4.

Technical Details

Official anchors at this snapshot:

  • 1M token context window on both deepseek-v4-flash and deepseek-v4-pro.
  • 384K max output tokens.
  • Two API model IDs: deepseek-v4-flash and deepseek-v4-pro.
  • Thinking mode by default, with a non-thinking mode toggle.
  • OpenAI Chat Completions and Anthropic-compatible API formats.

Open-weight releases of the V4 family are published on DeepSeek’s Hugging Face organization under DeepSeek’s standard license, which keeps self-hosted, private, and air-gapped deployments viable for teams that need them.

Pricing & Access

Listed API pricing (per 1M tokens):

  • deepseek-v4-flash input cache miss: $0.14
  • deepseek-v4-flash input cache hit: $0.0028
  • deepseek-v4-flash output: $0.28
  • deepseek-v4-pro input cache miss: $1.74
  • deepseek-v4-pro input cache hit: $0.0145
  • deepseek-v4-pro output: $3.48

DeepSeek currently runs a 75% launch discount on deepseek-v4-pro through May 31, 2026 at 15:59 UTC. During the promo window, effective cache-miss input drops to roughly $0.435 per million tokens, and promo output drops to roughly $0.87 per million tokens. Production planning should still use the regular rates above for cost models that extend past the discount window.

Access options:

  • DeepSeek API (deepseek-v4-flash, deepseek-v4-pro)
  • Open-weight downloads on Hugging Face
  • Third-party inference hosts and gateways that support DeepSeek models

Best Use Cases

Choose DeepSeek V4 for:

  • Long-context retrieval, summarization, and analysis where a 1M-token window simplifies the retrieval architecture.
  • Cost-sensitive production assistants that still need solid reasoning behavior.
  • Open-weight deployments where regulatory, privacy, or sovereignty requirements rule out US-hosted closed APIs.
  • Teams already using earlier DeepSeek lines that need a planned migration target before deepseek-chat and deepseek-reasoner are deprecated in July 2026.

V4 is less of a fit when frontier-only intelligence is the primary requirement, when official enterprise support contracts and data residency commitments are mandatory, or when the workflow is built around tool-use ecosystems specific to OpenAI, Anthropic, or Google.

Comparisons

  • DeepSeek-R1 (DeepSeek): Reasoning-specialized line that remains available alongside V4 for analytical workloads.
  • Claude Opus 4.7 (Anthropic): Premium frontier alternative with stronger enterprise distribution and governance, at materially higher cost.
  • GPT-5.5 (OpenAI): Closed-source generalist flagship with broader product surfaces, while V4 leads on price and self-hostable open weights.