LLM project for training a large language model from scratch
Top 60.5% on sourcepulse
This project provides a comprehensive guide and codebase for training a 1 billion parameter large language model (LLM) from scratch, covering pre-training, supervised fine-tuning (SFT), and direct preference optimization (DPO). It targets researchers and developers interested in understanding and replicating the LLM training pipeline with resource constraints, demonstrating that training is feasible even with consumer-grade GPUs like the T4.
How It Works
The project builds upon the Qwen2.5-0.5B-Instruct model, expanding its architecture and initializing parameters randomly. It utilizes a curated dataset of 16B tokens for pre-training, 9M examples for SFT, and 60K for DPO, sourced from reputable institutions. Training employs flash_attention_2 for acceleration and DeepSpeed for distributed training, achieving efficient training runs on multiple H800 GPUs. The project also explores concepts like scaling laws, the "repeater phenomenon," and knowledge injection during fine-tuning.
Quick Start & Requirements
pip install flash-attn trl==0.11.4 transformers==4.45.0
python demo/demo_pt.py
, python demo/demo_sft.py
, python demo/demo_dpo.py
Highlighted Details
Maintenance & Community
The project is actively maintained by qiufengqijun. Community interaction and discussion are encouraged.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The project notes that DPO did not significantly improve performance and may even degrade it in some configurations, suggesting careful hyperparameter tuning and data quality are critical for RLHF stages. The "repeater phenomenon" persists even in fine-tuned models, though it is somewhat mitigated. The project also highlights potential compatibility issues with specific versions of trl
and transformers
, and the need to match flash-attn
with the correct PyTorch and CUDA versions.
5 months ago
Inactive