Distributed trainer for LLMs
Top 56.8% on sourcepulse
This repository provides a distributed training framework for large language models (LLMs), enabling pre-training and fine-tuning at scale. It is a fork of Nvidia's Megatron-LM, enhanced with support for modern architectures like Llama, Mistral, and Falcon, and optimized for training large models on commodity hardware. The primary audience is researchers and engineers working with LLMs who need efficient distributed training capabilities.
How It Works
The framework leverages a 3-way parallelism strategy (tensor, pipeline, and data parallelism) inherited from Megatron-LM. It incorporates advanced techniques such as grouped-query attention (GQA), multi-query attention (MQA), Rotary Position Embeddings (RoPE) with scaling for longer contexts, RMS layer norm, Lima dropout, and FlashAttention 2 for improved performance and efficiency. Support for BF16/FP16 training and seamless integration with Hugging Face models further enhance its utility.
Quick Start & Requirements
pip install -r requirements.txt
within the docs/
directory.Highlighted Details
Maintenance & Community
The project is actively maintained by a team of researchers from EPFL. Notable models trained using this framework include TOWER, Meditron 70b, and Llama2-70b-OAsst-sft-v10.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. Users should verify licensing for commercial use or closed-source linking.
Limitations & Caveats
The README does not specify hardware requirements beyond "commodity hardware on multiple nodes," nor does it detail the setup time or resource footprint for training large models. The licensing status requires clarification for commercial applications.
1 year ago
1 day