Llama 4 Scout

Meta · Llama 4

Meta's efficiency-focused Llama 4 MoE model with a headline 10M-token context window.

Part of Llama family · Other versions: Llama 4 Maverick
Type
multimodal
Context
10M tokens
Max Output
33K tokens
Status
current
API Access
Yes
License
Llama Community
open-weights efficient self-hosted automation customization moe long-context
Released April 2025 · Updated May 16, 2026

Overview

Freshness note: Model capabilities, deployment options, and licensing terms can change. This profile is a point-in-time snapshot last verified on May 16, 2026.

Llama 4 Scout is Meta’s efficiency-oriented Llama 4 model for teams that need customization with lower serving cost than larger open models. Meta’s official launch material describes Scout as a natively multimodal MoE model with 17B active parameters, 109B total parameters, 16 experts, and a headline 10 million token context window.

Capabilities

Scout is typically used for structured assistant tasks, summarization, extraction, multimodal understanding, and moderate reasoning workflows. The headline differentiator is long context: Meta frames Scout around multi-document summarization, user-activity personalization, and codebase-scale reasoning where the model can use far more context than traditional open-weight models.

Technical Details

Meta’s official materials list multilingual text and image input, multilingual text and code output, native multimodal pretraining, and 17B active parameters with 16 experts. Scout is the more accessible released Llama 4 route, though real-world performance still depends on runtime optimizations, quantization, context implementation, and evaluation quality.

Pricing & Access

There is no single universal pricing model because deployment can be self-managed or provider-hosted through many inference partners. Teams should model both compute and operational overhead when comparing against closed API alternatives.

Best Use Cases

Strong fit for internal copilots, domain-specific automation, document-heavy assistants, and budget-constrained environments that still require control over deployment and data boundaries. Validate long-context retrieval behavior against your own data before assuming the full headline context window is useful in production.

Comparisons

Compared with Llama 4 Maverick, Scout favors lower cost and easier deployment over maximum quality. Compared with GPT-5 mini or Gemini 2.5 Flash-Lite, Scout provides more control but usually needs more engineering investment to operate well. Compared with Qwen3.6-27B, Scout emphasizes the Llama ecosystem and long-context ambition, while Qwen is often easier to fit into smaller self-hosted deployments.