Claude Haiku 4.5
Anthropic · Claude 4
Fast and efficient Claude tier for latency-sensitive assistant and automation workloads.
Overview
Freshness note: Model capabilities, limits, and pricing can change quickly. This profile is a point-in-time snapshot last verified on April 18, 2026.
Claude Haiku 4.5 is Anthropic’s current small-model route for high-throughput workloads. Anthropic’s current model page describes it as the fastest and most cost-efficient Claude model, useful for teams optimizing latency and cost while retaining stronger quality than bare-minimum baseline models.
Capabilities
Anthropic explicitly positions Haiku 4.5 around speed, coding responsiveness, computer use, and parallelized subagent work. In practice, that makes it strongest in concise summarization, extraction, routing, lightweight coding, and customer-facing or internal operational flows that require quick response cycles.
Technical Details
Anthropic’s launch and current model page frame Haiku 4.5 as the throughput-oriented Claude where speed and predictable cost matter most. Anthropic also explicitly says Haiku 4.5 matches Sonnet 4 on coding, computer use, and agent tasks. The practical takeaway is not just “cheap assistant model.” It is a model that can handle coding, computer-use, and orchestration tasks at a price point that supports scaled deployment. It is still best used with explicit format constraints and robust post-validation for critical outputs.
Pricing & Access
Current Anthropic pricing for Claude Haiku 4.5 is 5 per 1M output tokens. Anthropic’s current model page also lists prompt-caching and batch-processing discounts. Access is available through Claude.ai on web, iOS, and Android, the Anthropic API, AWS Bedrock, Google Vertex AI, Microsoft Foundry, Claude products, and Claude Code.
Best Use Cases
Best for support summarization, ticket routing, knowledge normalization, coding subagents, and lightweight coding or documentation workflows where cost and speed dominate.
Comparisons
Compared with Claude Sonnet 4.6, Haiku 4.5 offers better cost and latency with a lower ceiling on complex reasoning. Compared with GPT-5 nano, selection depends on ecosystem and output-style fit. Compared with Gemini 2.5 Flash-Lite, both target efficient production use with different platform tradeoffs.