Open-source reproduction of LLaMA models
Top 7.1% on sourcepulse
OpenLLaMA provides open-source reproductions of Meta AI's LLaMA models (3B, 7B, and 13B parameters), trained on 1T tokens using permissively licensed datasets. It targets researchers and developers seeking LLaMA-compatible models without restrictive licensing, offering PyTorch and JAX weights for broad integration.
How It Works
OpenLLaMA replicates the LLaMA architecture and training methodology, including hyperparameters and context length. The v1 models use the RedPajama dataset, while v2 models incorporate Falcon, StarCoder, and parts of RedPajama. This approach ensures compatibility with existing LLaMA implementations while utilizing openly available data. Training was performed on TPU-v4s using the JAX-based EasyLM framework, employing data parallelism and ZeRO stage 3 for efficiency.
Quick Start & Requirements
transformers
library).transformers
. For v2 models, avoid the fast tokenizer; use LlamaTokenizer
or AutoTokenizer(use_fast=False)
.torch_dtype=torch.float16
).Highlighted Details
Maintenance & Community
Developed by Xinyang Geng and Hao Liu from Berkeley AI Research. Feedback is welcomed via GitHub issues.
Licensing & Compatibility
Apache 2.0 license. Permissive for commercial use and integration with closed-source projects.
Limitations & Caveats
The v1 tokenizer's handling of whitespace may cause issues with code generation tasks. The project notes potential data contamination on specific tasks (CB, WSC) for their models.
2 years ago
Inactive