llama-models by meta-llama

Utilities for Llama models

Created 1 year ago

7,426 stars

Top 6.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Romain Huet

Head of Developer Experience at OpenAI

Project Summary

This repository provides utilities and access to Meta's Llama family of large language models, designed for developers, researchers, and businesses to build and experiment with generative AI. It offers open access to cutting-edge LLMs, fostering a broad ecosystem and emphasizing trust and safety in AI development.

How It Works

The project offers access to various Llama model versions (Llama 2, 3, 3.1, 3.2, 3.3, 4) with different parameter sizes and context lengths. Models utilize Sentencepiece or TikToken-based tokenizers. Inference can be run natively using provided scripts or via the Hugging Face transformers library. Advanced features include FP8 and Int4 quantization for reduced memory footprint, requiring specific GPU configurations.

Quick Start & Requirements

Installation: pip install llama-models and pip install llama-stack for CLI access.
Model Download: Requires requesting access via the Meta Llama website, receiving a signed URL, and using the llama-stack CLI (llama download). Alternatively, download from Hugging Face.
Prerequisites: Python environment, torch. Llama 4 models require at least 4 GPUs for full precision inference. FP8 quantization requires 2x 80GB GPUs; Int4 requires 1x 80GB GPU.
Resources: Significant VRAM and multiple GPUs are needed for larger models and higher precision inference.
Links: Meta Llama website, Llama Stack, Hugging Face.

Highlighted Details

Supports multiple Llama model versions, including Llama 4 with context lengths up to 10M tokens.
Offers FP8 and Int4 quantization for memory optimization.
Provides native inference scripts and Hugging Face transformers integration.
Includes a dedicated CLI tool (llama-stack) for model management.

Maintenance & Community

Active development with frequent model releases (e.g., Llama 4 in April 2025).
Community support via Discord: https://discord.gg/TZAAYNVtrU.
Issue reporting via GitHub issues.

Licensing & Compatibility

Model weights are licensed for researchers and commercial entities. Specific license details are available per model version (e.g., models/llama2/LICENSE).
Compatible with Hugging Face transformers library.

Limitations & Caveats

Access to model weights requires explicit request and approval from Meta.
Running larger models, especially at full precision, demands substantial GPU resources (multiple high-VRAM GPUs).
The project is a collection of utilities and model access points, not a single executable inference engine.

Health Check

Last Commit

3 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

63 stars in the last 30 days