huggingface-llama-recipes by huggingface

Recipes for Llama 3 models

Created 1 year ago

695 stars

Top 49.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Lewis Tunstall

Research Engineer at Hugging Face

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides minimal, runnable examples for quickly getting started with Meta's Llama 3.x family of models, including Llama 3.1, 3.2, and 3.3. It targets developers and researchers looking to experiment with Llama models for inference, fine-tuning, and advanced use cases like assisted decoding and RAG, offering a practical entry point to these powerful LLMs.

How It Works

The recipes leverage the Hugging Face transformers library for seamless integration with Llama models. It demonstrates core functionalities such as text generation via pipeline, local inference with various quantization techniques (4-bit, 8-bit, AWQ, GPTQ), and fine-tuning using PEFT and TRL. The approach emphasizes practical, code-first examples for rapid adoption and experimentation.

Quick Start & Requirements

Install: pip install -U transformers
Prerequisites: CUDA-enabled GPU recommended for optimal performance. Access to Llama models requires accepting their license and requesting permission via Hugging Face.
Resources: Memory requirements vary significantly by model size and quantization (e.g., Llama 3.1 8B in 4-bit requires ~4GB).
Links: Hugging Face announcement blog post (3.1), Open Source AI Cookbook

Highlighted Details

Supports Llama 3.1, 3.2, and 3.3 variants, including the large 405B parameter model.
Demonstrates advanced techniques like assisted decoding for up to 2x speedup and integration with Llama Guard for safety.
Includes recipes for fine-tuning on custom datasets and building RAG pipelines.
Covers performance optimizations using torch.compile and KV cache quantization.

Maintenance & Community

This repository is actively maintained by Hugging Face. Further community engagement and updates can be found via Hugging Face's official channels.

Licensing & Compatibility

The recipes themselves are likely under a permissive license (e.g., Apache 2.0), but the use of Llama models is governed by Meta's Llama license, which may have restrictions on commercial use and redistribution.

Limitations & Caveats

The repository is explicitly marked as "WIP" (Work In Progress), indicating potential for frequent changes and instability. Access to Llama models is gated by Meta's approval process.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days