llama  by meta-llama

Inference code for Llama 2 models (deprecated)

created 2 years ago
58,577 stars

Top 0.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides inference code for Meta's Llama models, specifically Llama 2. It's designed for researchers and businesses to load and run pre-trained and fine-tuned language models, ranging from 7B to 70B parameters, enabling experimentation and application development.

How It Works

The project utilizes PyTorch and a model-parallelism approach for efficient inference. It allows loading model weights and tokenizers, with specific scripts for text completion and chat-based interactions. The architecture supports varying model-parallel (MP) values depending on model size (7B=1, 13B=2, 70B=8) and allows customization of sequence length and batch size for hardware optimization.

Quick Start & Requirements

  • Install: pip install -e . within a conda environment with PyTorch/CUDA.
  • Prerequisites: wget, md5sum, PyTorch with CUDA support. Model weights must be downloaded separately from Meta's website after accepting their license.
  • Resources: Requires downloading model weights (size varies by parameter count).
  • Links: llama-models, llama-cookbook.

Highlighted Details

  • Supports Llama 2 models from 7B to 70B parameters.
  • Includes example scripts for both raw text completion and chat-based inference.
  • Fine-tuned chat models require specific prompt formatting (INST, <>, BOS, EOS tokens).
  • Model weights require a separate download process via a signed URL from Meta.

Maintenance & Community

This repository is deprecated in favor of a consolidated Llama Stack. New development and support are directed to llama-models, PurpleLlama, llama-toolchain, llama-agentic-system, and llama-cookbook. Issues can be filed on these new repositories.

Licensing & Compatibility

Model weights and code are licensed for both research and commercial entities. An Acceptable Use Policy is provided.

Limitations & Caveats

This repository is deprecated. Users are directed to use the new Llama Stack repositories for current development and support. The README notes that testing has not covered all potential use scenarios, and users should consult the Responsible Use Guide.

Health Check
Last commit

6 months ago

Responsiveness

1+ week

Pull Requests (30d)
1
Issues (30d)
4
Star History
644 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.