TinyLlama  by jzhang38

Tiny pretraining project for a 1.1B Llama model

Created 2 years ago
8,750 stars

Top 5.9% on SourcePulse

GitHubView on GitHub
Project Summary

The TinyLlama project provides a 1.1 billion parameter Llama-compatible language model pre-trained on 3 trillion tokens. It's designed for researchers and developers needing a compact yet capable LLM for applications with restricted computational resources, such as edge devices or real-time dialogue generation, and for studying LLM training dynamics.

How It Works

TinyLlama utilizes the Llama 2 architecture and tokenizer, incorporating optimizations like Flash Attention 2, fused layernorm, swiglu, and rotary positional embeddings. This approach achieves high training throughput (24k tokens/sec/A100) and reduced memory footprint, enabling training on consumer hardware and efficient inference. The extensive pre-training on 3 trillion tokens aims to push the boundaries of smaller models, exploring potential saturation points.

Quick Start & Requirements

  • Install: Models are available on Hugging Face. Usage typically involves libraries like transformers, llama.cpp, or vLLM.
  • Requirements: Python, PyTorch. Specific inference frameworks may have their own dependencies (e.g., CUDA for GPU acceleration).
  • Resources: 4-bit quantized models require ~637MB VRAM. Training requires significant GPU resources (e.g., 16x A100-40G for pre-training).
  • Links: Chat Demo, Discord, EVAL.md, PRETRAIN.md.

Highlighted Details

  • 1.1B parameter model trained on 3T tokens, matching Llama 2 architecture.
  • Achieves 24k tokens/sec/A100 throughput with fused optimizations.
  • Offers intermediate checkpoints at various token milestones (105B to 3T).
  • Includes fine-tuned chat models and speculative decoding examples.

Maintenance & Community

  • Active development with community contributions welcomed.
  • Discord server available for discussion.
  • Roadmap includes adding pre-training scripts, sequence length extrapolation, and mobile demos.

Licensing & Compatibility

  • The model weights are released under the Apache 2.0 license.
  • Compatible with Llama 2 ecosystem projects.

Limitations & Caveats

  • The project is under active development, with some planned features yet to be implemented.
  • While extensive, 3T tokens may not represent full saturation for a 1.1B model, as indicated by ongoing research.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
53 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

4.8%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 3 weeks ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

LLaMA-Adapter by OpenGVLab

0.1%
6k
Efficient fine-tuning for instruction-following LLaMA models
Created 2 years ago
Updated 1 year ago
Starred by Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
40 more.

llama by meta-llama

0.1%
59k
Inference code for Llama 2 models (deprecated)
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.