llama-models  by meta-llama

Utilities for Llama models

Created 1 year ago
7,257 stars

Top 7.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides utilities and access to Meta's Llama family of large language models, designed for developers, researchers, and businesses to build and experiment with generative AI. It offers open access to cutting-edge LLMs, fostering a broad ecosystem and emphasizing trust and safety in AI development.

How It Works

The project offers access to various Llama model versions (Llama 2, 3, 3.1, 3.2, 3.3, 4) with different parameter sizes and context lengths. Models utilize Sentencepiece or TikToken-based tokenizers. Inference can be run natively using provided scripts or via the Hugging Face transformers library. Advanced features include FP8 and Int4 quantization for reduced memory footprint, requiring specific GPU configurations.

Quick Start & Requirements

  • Installation: pip install llama-models and pip install llama-stack for CLI access.
  • Model Download: Requires requesting access via the Meta Llama website, receiving a signed URL, and using the llama-stack CLI (llama download). Alternatively, download from Hugging Face.
  • Prerequisites: Python environment, torch. Llama 4 models require at least 4 GPUs for full precision inference. FP8 quantization requires 2x 80GB GPUs; Int4 requires 1x 80GB GPU.
  • Resources: Significant VRAM and multiple GPUs are needed for larger models and higher precision inference.
  • Links: Meta Llama website, Llama Stack, Hugging Face.

Highlighted Details

  • Supports multiple Llama model versions, including Llama 4 with context lengths up to 10M tokens.
  • Offers FP8 and Int4 quantization for memory optimization.
  • Provides native inference scripts and Hugging Face transformers integration.
  • Includes a dedicated CLI tool (llama-stack) for model management.

Maintenance & Community

  • Active development with frequent model releases (e.g., Llama 4 in April 2025).
  • Community support via Discord: https://discord.gg/TZAAYNVtrU.
  • Issue reporting via GitHub issues.

Licensing & Compatibility

  • Model weights are licensed for researchers and commercial entities. Specific license details are available per model version (e.g., models/llama2/LICENSE).
  • Compatible with Hugging Face transformers library.

Limitations & Caveats

  • Access to model weights requires explicit request and approval from Meta.
  • Running larger models, especially at full precision, demands substantial GPU resources (multiple high-VRAM GPUs).
  • The project is a collection of utilities and model access points, not a single executable inference engine.
Health Check
Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
5
Star History
64 stars in the last 30 days

Explore Similar Projects

Starred by Ross Wightman Ross Wightman(Author of timm; CV at Hugging Face), Awni Hannun Awni Hannun(Author of MLX; Research Scientist at Apple), and
1 more.

mlx-llm by riccardomusmeci

0%
454
LLM tools/apps for Apple Silicon using MLX
Created 1 year ago
Updated 7 months ago
Starred by Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
40 more.

llama by meta-llama

0.1%
59k
Inference code for Llama 2 models (deprecated)
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.