Utilities for Llama models
Top 7.3% on sourcepulse
This repository provides utilities and access to Meta's Llama family of large language models, designed for developers, researchers, and businesses to build and experiment with generative AI. It offers open access to cutting-edge LLMs, fostering a broad ecosystem and emphasizing trust and safety in AI development.
How It Works
The project offers access to various Llama model versions (Llama 2, 3, 3.1, 3.2, 3.3, 4) with different parameter sizes and context lengths. Models utilize Sentencepiece or TikToken-based tokenizers. Inference can be run natively using provided scripts or via the Hugging Face transformers
library. Advanced features include FP8 and Int4 quantization for reduced memory footprint, requiring specific GPU configurations.
Quick Start & Requirements
pip install llama-models
and pip install llama-stack
for CLI access.llama-stack
CLI (llama download
). Alternatively, download from Hugging Face.torch
. Llama 4 models require at least 4 GPUs for full precision inference. FP8 quantization requires 2x 80GB GPUs; Int4 requires 1x 80GB GPU.Highlighted Details
transformers
integration.llama-stack
) for model management.Maintenance & Community
Licensing & Compatibility
models/llama2/LICENSE
).transformers
library.Limitations & Caveats
2 weeks ago
1 week