llama  by meta-llama

Inference code for Llama 2 models (deprecated)

Created 2 years ago
58,760 stars

Top 0.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides inference code for Meta's Llama models, specifically Llama 2. It's designed for researchers and businesses to load and run pre-trained and fine-tuned language models, ranging from 7B to 70B parameters, enabling experimentation and application development.

How It Works

The project utilizes PyTorch and a model-parallelism approach for efficient inference. It allows loading model weights and tokenizers, with specific scripts for text completion and chat-based interactions. The architecture supports varying model-parallel (MP) values depending on model size (7B=1, 13B=2, 70B=8) and allows customization of sequence length and batch size for hardware optimization.

Quick Start & Requirements

  • Install: pip install -e . within a conda environment with PyTorch/CUDA.
  • Prerequisites: wget, md5sum, PyTorch with CUDA support. Model weights must be downloaded separately from Meta's website after accepting their license.
  • Resources: Requires downloading model weights (size varies by parameter count).
  • Links: llama-models, llama-cookbook.

Highlighted Details

  • Supports Llama 2 models from 7B to 70B parameters.
  • Includes example scripts for both raw text completion and chat-based inference.
  • Fine-tuned chat models require specific prompt formatting (INST, <>, BOS, EOS tokens).
  • Model weights require a separate download process via a signed URL from Meta.

Maintenance & Community

This repository is deprecated in favor of a consolidated Llama Stack. New development and support are directed to llama-models, PurpleLlama, llama-toolchain, llama-agentic-system, and llama-cookbook. Issues can be filed on these new repositories.

Licensing & Compatibility

Model weights and code are licensed for both research and commercial entities. An Acceptable Use Policy is provided.

Limitations & Caveats

This repository is deprecated. Users are directed to use the new Llama Stack repositories for current development and support. The README notes that testing has not covered all potential use scenarios, and users should consult the Responsible Use Guide.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
175 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.