open_llama  by openlm-research

Open-source reproduction of LLaMA models

created 2 years ago
7,514 stars

Top 7.1% on sourcepulse

GitHubView on GitHub
Project Summary

OpenLLaMA provides open-source reproductions of Meta AI's LLaMA models (3B, 7B, and 13B parameters), trained on 1T tokens using permissively licensed datasets. It targets researchers and developers seeking LLaMA-compatible models without restrictive licensing, offering PyTorch and JAX weights for broad integration.

How It Works

OpenLLaMA replicates the LLaMA architecture and training methodology, including hyperparameters and context length. The v1 models use the RedPajama dataset, while v2 models incorporate Falcon, StarCoder, and parts of RedPajama. This approach ensures compatibility with existing LLaMA implementations while utilizing openly available data. Training was performed on TPU-v4s using the JAX-based EasyLM framework, employing data parallelism and ZeRO stage 3 for efficiency.

Quick Start & Requirements

  • Install/Run: Load via Hugging Face Transformers (transformers library).
  • Prerequisites: PyTorch, transformers. For v2 models, avoid the fast tokenizer; use LlamaTokenizer or AutoTokenizer(use_fast=False).
  • Resources: Requires sufficient VRAM for model size (e.g., 7B model with torch_dtype=torch.float16).
  • Docs: Hugging Face Transformers LLaMA documentation

Highlighted Details

  • Permissively licensed under Apache 2.0.
  • v2 models offer improved performance and dataset mixture.
  • Tokenizer merges multiple spaces, potentially impacting code generation tasks for v1 models.
  • Comparable performance to original LLaMA and GPT-J on various benchmarks.

Maintenance & Community

Developed by Xinyang Geng and Hao Liu from Berkeley AI Research. Feedback is welcomed via GitHub issues.

Licensing & Compatibility

Apache 2.0 license. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The v1 tokenizer's handling of whitespace may cause issues with code generation tasks. The project notes potential data contamination on specific tasks (CB, WSC) for their models.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
3 more.

LLaMA-Adapter by OpenGVLab

0.0%
6k
Efficient fine-tuning for instruction-following LLaMA models
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

codellama by meta-llama

0.1%
16k
Inference code for CodeLlama models
created 1 year ago
updated 11 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.