llama-models  by meta-llama

Utilities for Llama models

created 1 year ago
7,170 stars

Top 7.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides utilities and access to Meta's Llama family of large language models, designed for developers, researchers, and businesses to build and experiment with generative AI. It offers open access to cutting-edge LLMs, fostering a broad ecosystem and emphasizing trust and safety in AI development.

How It Works

The project offers access to various Llama model versions (Llama 2, 3, 3.1, 3.2, 3.3, 4) with different parameter sizes and context lengths. Models utilize Sentencepiece or TikToken-based tokenizers. Inference can be run natively using provided scripts or via the Hugging Face transformers library. Advanced features include FP8 and Int4 quantization for reduced memory footprint, requiring specific GPU configurations.

Quick Start & Requirements

  • Installation: pip install llama-models and pip install llama-stack for CLI access.
  • Model Download: Requires requesting access via the Meta Llama website, receiving a signed URL, and using the llama-stack CLI (llama download). Alternatively, download from Hugging Face.
  • Prerequisites: Python environment, torch. Llama 4 models require at least 4 GPUs for full precision inference. FP8 quantization requires 2x 80GB GPUs; Int4 requires 1x 80GB GPU.
  • Resources: Significant VRAM and multiple GPUs are needed for larger models and higher precision inference.
  • Links: Meta Llama website, Llama Stack, Hugging Face.

Highlighted Details

  • Supports multiple Llama model versions, including Llama 4 with context lengths up to 10M tokens.
  • Offers FP8 and Int4 quantization for memory optimization.
  • Provides native inference scripts and Hugging Face transformers integration.
  • Includes a dedicated CLI tool (llama-stack) for model management.

Maintenance & Community

  • Active development with frequent model releases (e.g., Llama 4 in April 2025).
  • Community support via Discord: https://discord.gg/TZAAYNVtrU.
  • Issue reporting via GitHub issues.

Licensing & Compatibility

  • Model weights are licensed for researchers and commercial entities. Specific license details are available per model version (e.g., models/llama2/LICENSE).
  • Compatible with Hugging Face transformers library.

Limitations & Caveats

  • Access to model weights requires explicit request and approval from Meta.
  • Running larger models, especially at full precision, demands substantial GPU resources (multiple high-VRAM GPUs).
  • The project is a collection of utilities and model access points, not a single executable inference engine.
Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
6
Star History
279 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.