Megatron-LLM  by epfLLM

Distributed trainer for LLMs

Created 2 years ago
580 stars

Top 55.8% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a distributed training framework for large language models (LLMs), enabling pre-training and fine-tuning at scale. It is a fork of Nvidia's Megatron-LM, enhanced with support for modern architectures like Llama, Mistral, and Falcon, and optimized for training large models on commodity hardware. The primary audience is researchers and engineers working with LLMs who need efficient distributed training capabilities.

How It Works

The framework leverages a 3-way parallelism strategy (tensor, pipeline, and data parallelism) inherited from Megatron-LM. It incorporates advanced techniques such as grouped-query attention (GQA), multi-query attention (MQA), Rotary Position Embeddings (RoPE) with scaling for longer contexts, RMS layer norm, Lima dropout, and FlashAttention 2 for improved performance and efficiency. Support for BF16/FP16 training and seamless integration with Hugging Face models further enhance its utility.

Quick Start & Requirements

  • Install via pip install -r requirements.txt within the docs/ directory.
  • Requires building documentation from source.
  • Further details and documentation are available at online documentation.

Highlighted Details

  • Supports training of large models (up to 70B parameters) on commodity hardware across multiple nodes.
  • Implements 3-way parallelism: tensor, pipeline, and data parallelism.
  • Includes support for Llama, Llama 2, Code Llama, Falcon, and Mistral architectures.
  • Features GQA, MQA, RoPE, RoPE scaling, RMS layer norm, Lima dropout, and FlashAttention 2.
  • Offers full pretraining, fine-tuning, and instruct tuning capabilities.
  • Integrates with WandB for logging and supports custom metrics.
  • Enables conversion to and from Hugging Face hub.

Maintenance & Community

The project is actively maintained by a team of researchers from EPFL. Notable models trained using this framework include TOWER, Meditron 70b, and Llama2-70b-OAsst-sft-v10.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README does not specify hardware requirements beyond "commodity hardware on multiple nodes," nor does it detail the setup time or resource footprint for training large models. The licensing status requires clarification for commercial applications.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

picotron by huggingface

4.8%
2k
Minimalist distributed training framework for educational use
Created 1 year ago
Updated 3 weeks ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
13 more.

torchtitan by pytorch

0.7%
4k
PyTorch platform for generative AI model training research
Created 1 year ago
Updated 18 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.2%
7k
Framework for training large-scale autoregressive language models
Created 4 years ago
Updated 2 days ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Li Jiang Li Jiang(Coauthor of AutoGen; Engineer at Microsoft), and
26 more.

ColossalAI by hpcaitech

0.1%
41k
AI system for large-scale parallel training
Created 3 years ago
Updated 12 hours ago
Feedback? Help us improve.