Llama
FamilyMeta · Llama
Meta's open-weight Llama family for self-hosting, fine-tuning, and privacy-conscious multimodal AI deployments.
Overview
This is a model family overview. For version-specific details, see the individual model entries linked below.
Llama is Meta’s open-weight model family and the most widely adopted open-source model series. It has driven much of the local and self-hosted AI ecosystem, and the Llama 4 generation shifts the family more explicitly into natively multimodal MoE models rather than only text-first releases.
Current Latest
Llama 4 is the current generation, led by Llama 4 Scout and Llama 4 Maverick. Meta’s official April 5, 2025 launch materials describe both as natively multimodal open-weight models with 17B active parameters, using 16 experts for Scout and 128 experts for Maverick.
Strengths
- Fully open weights — can be downloaded, hosted, and fine-tuned under Meta’s license terms
- Excellent option for local and on-premises deployment
- Strong privacy properties — data never leaves your infrastructure
- Native multimodal support in the Llama 4 generation
- Vibrant community ecosystem across quantized releases and local serving stacks
When to Choose Llama
- Privacy-critical workloads — healthcare, legal, finance where data must stay local
- Self-hosted deployments — on-premises or private cloud setups
- Fine-tuning — custom models for specific domains or tasks
- Cost optimization — no per-token API costs after infrastructure investment
- Offline/air-gapped environments — works without internet access
Running Locally
The easiest ways to run Llama locally:
- Ollama — Single-command setup, CLI-based, great for development
- LM Studio — Desktop app with GUI, good for exploration
Access
- Download from Meta (Hugging Face, direct)
- Via Ollama, LM Studio, and other local inference tools
- Cloud providers: AWS, Azure, Google Cloud, Together AI, Fireworks AI
- API access through many third-party providers