Llama

Family

Meta · Llama

Meta's open-weight Llama family for self-hosting, fine-tuning, and privacy-conscious multimodal AI deployments.

open-source self-hosted multimodal coding local-ai model-family
Updated May 1, 2026

Overview

This is a model family overview. For version-specific details, see the individual model entries linked below.

Llama is Meta’s open-weight model family and one of the main engines behind the local and self-hosted AI ecosystem. The Llama 4 generation shifts the family more explicitly into native multimodal MoE models rather than only text-first releases, while keeping the gated but widely adopted Llama license model.

Current Latest

Llama 4 is the current generation, led by Llama 4 Scout and Llama 4 Maverick. Meta’s official model cards describe both as natively multimodal models released on April 5, 2025, with 17B active parameters each, using 16 experts for Scout and 128 experts for Maverick.

The largest planned Llama 4 variant, Llama 4 Behemoth, has not been released. Meta has paused its rollout and uses Behemoth internally as a teacher model for codistillation into Scout and Maverick. There is no firm release date as of this snapshot.

Strengths

  • Downloadable weights under Meta’s Llama Community terms
  • Excellent option for local and on-premises deployment
  • Strong privacy properties when self-hosted
  • Native multimodal support in the Llama 4 generation
  • Vibrant community ecosystem across quantized releases and local serving stacks

When to Choose Llama

  • Privacy-critical workloads — healthcare, legal, finance where data must stay local
  • Self-hosted deployments — on-premises or private cloud setups
  • Fine-tuning / adaptation — custom models for specific domains or tasks
  • Cost optimization — no per-token API costs after infrastructure investment
  • Offline or air-gapped environments — works without internet access

Running Locally

The easiest ways to run Llama locally:

  • Ollama — Single-command setup, CLI-based, great for development
  • LM Studio — Desktop app with GUI, good for exploration

Access

  • Download from Meta (Hugging Face, direct)
  • Via Ollama, LM Studio, and other local inference tools
  • Cloud providers: AWS, Azure, Google Cloud, Together AI, Fireworks AI
  • API access through many third-party providers