open_llama  by openlm-research

Open-source reproduction of LLaMA models

Created 2 years ago
7,529 stars

Top 6.9% on SourcePulse

GitHubView on GitHub
Project Summary

OpenLLaMA provides open-source reproductions of Meta AI's LLaMA models (3B, 7B, and 13B parameters), trained on 1T tokens using permissively licensed datasets. It targets researchers and developers seeking LLaMA-compatible models without restrictive licensing, offering PyTorch and JAX weights for broad integration.

How It Works

OpenLLaMA replicates the LLaMA architecture and training methodology, including hyperparameters and context length. The v1 models use the RedPajama dataset, while v2 models incorporate Falcon, StarCoder, and parts of RedPajama. This approach ensures compatibility with existing LLaMA implementations while utilizing openly available data. Training was performed on TPU-v4s using the JAX-based EasyLM framework, employing data parallelism and ZeRO stage 3 for efficiency.

Quick Start & Requirements

  • Install/Run: Load via Hugging Face Transformers (transformers library).
  • Prerequisites: PyTorch, transformers. For v2 models, avoid the fast tokenizer; use LlamaTokenizer or AutoTokenizer(use_fast=False).
  • Resources: Requires sufficient VRAM for model size (e.g., 7B model with torch_dtype=torch.float16).
  • Docs: Hugging Face Transformers LLaMA documentation

Highlighted Details

  • Permissively licensed under Apache 2.0.
  • v2 models offer improved performance and dataset mixture.
  • Tokenizer merges multiple spaces, potentially impacting code generation tasks for v1 models.
  • Comparable performance to original LLaMA and GPT-J on various benchmarks.

Maintenance & Community

Developed by Xinyang Geng and Hao Liu from Berkeley AI Research. Feedback is welcomed via GitHub issues.

Licensing & Compatibility

Apache 2.0 license. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The v1 tokenizer's handling of whitespace may cause issues with code generation tasks. The project notes potential data contamination on specific tasks (CB, WSC) for their models.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
265
Efficiently train foundation models with PyTorch
Created 1 year ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

4.8%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 3 weeks ago
Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

EasyLM by young-geng

0.0%
2k
LLM training/finetuning framework in JAX/Flax
Created 2 years ago
Updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
20 more.

TinyLlama by jzhang38

0.1%
9k
Tiny pretraining project for a 1.1B Llama model
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.