Tiny pretraining project for a 1.1B Llama model
Top 6.0% on sourcepulse
The TinyLlama project provides a 1.1 billion parameter Llama-compatible language model pre-trained on 3 trillion tokens. It's designed for researchers and developers needing a compact yet capable LLM for applications with restricted computational resources, such as edge devices or real-time dialogue generation, and for studying LLM training dynamics.
How It Works
TinyLlama utilizes the Llama 2 architecture and tokenizer, incorporating optimizations like Flash Attention 2, fused layernorm, swiglu, and rotary positional embeddings. This approach achieves high training throughput (24k tokens/sec/A100) and reduced memory footprint, enabling training on consumer hardware and efficient inference. The extensive pre-training on 3 trillion tokens aims to push the boundaries of smaller models, exploring potential saturation points.
Quick Start & Requirements
transformers
, llama.cpp
, or vLLM
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day