huggingface-llama-recipes  by huggingface

Recipes for Llama 3 models

Created 1 year ago
680 stars

Top 49.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides minimal, runnable examples for quickly getting started with Meta's Llama 3.x family of models, including Llama 3.1, 3.2, and 3.3. It targets developers and researchers looking to experiment with Llama models for inference, fine-tuning, and advanced use cases like assisted decoding and RAG, offering a practical entry point to these powerful LLMs.

How It Works

The recipes leverage the Hugging Face transformers library for seamless integration with Llama models. It demonstrates core functionalities such as text generation via pipeline, local inference with various quantization techniques (4-bit, 8-bit, AWQ, GPTQ), and fine-tuning using PEFT and TRL. The approach emphasizes practical, code-first examples for rapid adoption and experimentation.

Quick Start & Requirements

  • Install: pip install -U transformers
  • Prerequisites: CUDA-enabled GPU recommended for optimal performance. Access to Llama models requires accepting their license and requesting permission via Hugging Face.
  • Resources: Memory requirements vary significantly by model size and quantization (e.g., Llama 3.1 8B in 4-bit requires ~4GB).
  • Links: Hugging Face announcement blog post (3.1), Open Source AI Cookbook

Highlighted Details

  • Supports Llama 3.1, 3.2, and 3.3 variants, including the large 405B parameter model.
  • Demonstrates advanced techniques like assisted decoding for up to 2x speedup and integration with Llama Guard for safety.
  • Includes recipes for fine-tuning on custom datasets and building RAG pipelines.
  • Covers performance optimizations using torch.compile and KV cache quantization.

Maintenance & Community

This repository is actively maintained by Hugging Face. Further community engagement and updates can be found via Hugging Face's official channels.

Licensing & Compatibility

The recipes themselves are likely under a permissive license (e.g., Apache 2.0), but the use of Llama models is governed by Meta's Llama license, which may have restrictions on commercial use and redistribution.

Limitations & Caveats

The repository is explicitly marked as "WIP" (Work In Progress), indicating potential for frequent changes and instability. Access to Llama models is gated by Meta's approval process.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech) and Clément Renault Clément Renault(Cofounder of Meilisearch).

lm.rs by samuel-vitorino

0%
1k
Minimal LLM inference in Rust
Created 1 year ago
Updated 10 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
8 more.

EAGLE by SafeAILab

10.6%
2k
Speculative decoding research paper for faster LLM inference
Created 1 year ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
5 more.

GPTQ-for-LLaMa by qwopqwop200

0.0%
3k
4-bit quantization for LLaMA models using GPTQ
Created 2 years ago
Updated 1 year ago
Starred by Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
40 more.

llama by meta-llama

0.1%
59k
Inference code for Llama 2 models (deprecated)
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.