TinyLlama  by jzhang38

Tiny pretraining project for a 1.1B Llama model

created 1 year ago
8,676 stars

Top 6.0% on sourcepulse

GitHubView on GitHub
Project Summary

The TinyLlama project provides a 1.1 billion parameter Llama-compatible language model pre-trained on 3 trillion tokens. It's designed for researchers and developers needing a compact yet capable LLM for applications with restricted computational resources, such as edge devices or real-time dialogue generation, and for studying LLM training dynamics.

How It Works

TinyLlama utilizes the Llama 2 architecture and tokenizer, incorporating optimizations like Flash Attention 2, fused layernorm, swiglu, and rotary positional embeddings. This approach achieves high training throughput (24k tokens/sec/A100) and reduced memory footprint, enabling training on consumer hardware and efficient inference. The extensive pre-training on 3 trillion tokens aims to push the boundaries of smaller models, exploring potential saturation points.

Quick Start & Requirements

  • Install: Models are available on Hugging Face. Usage typically involves libraries like transformers, llama.cpp, or vLLM.
  • Requirements: Python, PyTorch. Specific inference frameworks may have their own dependencies (e.g., CUDA for GPU acceleration).
  • Resources: 4-bit quantized models require ~637MB VRAM. Training requires significant GPU resources (e.g., 16x A100-40G for pre-training).
  • Links: Chat Demo, Discord, EVAL.md, PRETRAIN.md.

Highlighted Details

  • 1.1B parameter model trained on 3T tokens, matching Llama 2 architecture.
  • Achieves 24k tokens/sec/A100 throughput with fused optimizations.
  • Offers intermediate checkpoints at various token milestones (105B to 3T).
  • Includes fine-tuned chat models and speculative decoding examples.

Maintenance & Community

  • Active development with community contributions welcomed.
  • Discord server available for discussion.
  • Roadmap includes adding pre-training scripts, sequence length extrapolation, and mobile demos.

Licensing & Compatibility

  • The model weights are released under the Apache 2.0 license.
  • Compatible with Llama 2 ecosystem projects.

Limitations & Caveats

  • The project is under active development, with some planned features yet to be implemented.
  • While extensive, 3T tokens may not represent full saturation for a 1.1B model, as indicated by ongoing research.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
268 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.