Megatron-LLM by epfLLM

Distributed trainer for LLMs

Created 2 years ago

588 stars

Top 55.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Philipp Schmid

DevRel at Google DeepMind

Wing Lian

Founder of Axolotl AI

Binyuan Hui

Research Scientist at Alibaba Qwen

Project Summary

This repository provides a distributed training framework for large language models (LLMs), enabling pre-training and fine-tuning at scale. It is a fork of Nvidia's Megatron-LM, enhanced with support for modern architectures like Llama, Mistral, and Falcon, and optimized for training large models on commodity hardware. The primary audience is researchers and engineers working with LLMs who need efficient distributed training capabilities.

How It Works

The framework leverages a 3-way parallelism strategy (tensor, pipeline, and data parallelism) inherited from Megatron-LM. It incorporates advanced techniques such as grouped-query attention (GQA), multi-query attention (MQA), Rotary Position Embeddings (RoPE) with scaling for longer contexts, RMS layer norm, Lima dropout, and FlashAttention 2 for improved performance and efficiency. Support for BF16/FP16 training and seamless integration with Hugging Face models further enhance its utility.

Quick Start & Requirements

Install via pip install -r requirements.txt within the docs/ directory.
Requires building documentation from source.
Further details and documentation are available at online documentation.

Highlighted Details

Supports training of large models (up to 70B parameters) on commodity hardware across multiple nodes.
Implements 3-way parallelism: tensor, pipeline, and data parallelism.
Includes support for Llama, Llama 2, Code Llama, Falcon, and Mistral architectures.
Features GQA, MQA, RoPE, RoPE scaling, RMS layer norm, Lima dropout, and FlashAttention 2.
Offers full pretraining, fine-tuning, and instruct tuning capabilities.
Integrates with WandB for logging and supports custom metrics.
Enables conversion to and from Hugging Face hub.

Maintenance & Community

The project is actively maintained by a team of researchers from EPFL. Notable models trained using this framework include TOWER, Meditron 70b, and Llama2-70b-OAsst-sft-v10.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README does not specify hardware requirements beyond "commodity hardware on multiple nodes," nor does it detail the setup time or resource footprint for training large models. The licensing status requires clarification for commercial applications.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days