Llama 4 Scout
Meta · Llama 4
Efficiency-focused Llama 4 tier for customizable deployments with tighter compute budgets.
Overview
Freshness note: Model capabilities, deployment options, and licensing terms can change. This profile is a point-in-time snapshot last verified on February 15, 2026.
Llama 4 Scout is Meta’s efficiency-oriented open-weight Llama 4 model for teams that need customization with lower serving costs than larger open models. Meta’s official launch materials describe Scout as a natively multimodal MoE model with 17B active parameters across 16 experts and a 10 million token context window.
Capabilities
Scout is typically used for structured assistant tasks, summarization, extraction, multimodal understanding, and moderate reasoning workflows. It performs best when prompts and task domains are well defined.
Technical Details
Meta positions Scout as the more deployable of the two initial Llama 4 models, capable of running on a single H100 GPU with Int4 quantization. Performance outcomes still depend on runtime optimizations, serving stack choices, and evaluation quality.
Pricing & Access
There is no single universal pricing model because deployment can be self-managed or provider-hosted. Teams should model both compute and operational overhead when comparing against closed API alternatives.
Best Use Cases
Strong fit for internal copilots, domain-specific automation, and budget-constrained environments that still require control over deployment and data boundaries.
Comparisons
Compared with Llama 4 Maverick, Scout favors lower cost and throughput over maximum quality. Compared with GPT-5 nano or Gemini 2.5 Flash-Lite, Scout provides more control but often needs more engineering investment to operate well.