Megatron-LLM  by epfLLM

Distributed trainer for LLMs

created 2 years ago
578 stars

Top 56.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a distributed training framework for large language models (LLMs), enabling pre-training and fine-tuning at scale. It is a fork of Nvidia's Megatron-LM, enhanced with support for modern architectures like Llama, Mistral, and Falcon, and optimized for training large models on commodity hardware. The primary audience is researchers and engineers working with LLMs who need efficient distributed training capabilities.

How It Works

The framework leverages a 3-way parallelism strategy (tensor, pipeline, and data parallelism) inherited from Megatron-LM. It incorporates advanced techniques such as grouped-query attention (GQA), multi-query attention (MQA), Rotary Position Embeddings (RoPE) with scaling for longer contexts, RMS layer norm, Lima dropout, and FlashAttention 2 for improved performance and efficiency. Support for BF16/FP16 training and seamless integration with Hugging Face models further enhance its utility.

Quick Start & Requirements

  • Install via pip install -r requirements.txt within the docs/ directory.
  • Requires building documentation from source.
  • Further details and documentation are available at online documentation.

Highlighted Details

  • Supports training of large models (up to 70B parameters) on commodity hardware across multiple nodes.
  • Implements 3-way parallelism: tensor, pipeline, and data parallelism.
  • Includes support for Llama, Llama 2, Code Llama, Falcon, and Mistral architectures.
  • Features GQA, MQA, RoPE, RoPE scaling, RMS layer norm, Lima dropout, and FlashAttention 2.
  • Offers full pretraining, fine-tuning, and instruct tuning capabilities.
  • Integrates with WandB for logging and supports custom metrics.
  • Enables conversion to and from Hugging Face hub.

Maintenance & Community

The project is actively maintained by a team of researchers from EPFL. Notable models trained using this framework include TOWER, Meditron 70b, and Llama2-70b-OAsst-sft-v10.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README does not specify hardware requirements beyond "commodity hardware on multiple nodes," nor does it detail the setup time or resource footprint for training large models. The licensing status requires clarification for commercial applications.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Feedback? Help us improve.