Megatron-Bridge by NVIDIA-NeMo

Scalable LLM training and conversion between Hugging Face and Megatron Core

Created 9 months ago

448 stars

Top 67.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

This library addresses the need for seamless interoperability and efficient training of large language and vision-language models between the Hugging Face ecosystem and NVIDIA's Megatron Core. It targets researchers and engineers requiring advanced distributed training capabilities, offering a PyTorch-native solution for bidirectional model conversion, pretraining, and fine-tuning. The primary benefit is enabling users to leverage Megatron Core's parallelism and optimized training infrastructure with familiar Hugging Face models, thereby accelerating LLM/VLM development and deployment.

How It Works

Megatron-Bridge acts as a conversion and verification layer, facilitating bidirectional checkpoint conversion between Hugging Face and Megatron Core formats. It integrates a refactored, PyTorch-native training loop that leverages Megatron Core for advanced parallelism (tensor, pipeline) and mixed-precision training (FP8, BF16, FP4). The library supports using existing Hugging Face models or custom PyTorch definitions, with optimized paths for Transformer Engine, ensuring high throughput and scalability.

Quick Start & Requirements

The recommended installation is via the NeMo Framework container (nvcr.io/nvidia/nemo:${TAG}). A Python 3.10+ environment is required. Users must log in to Hugging Face Hub (huggingface-cli login). Launching training scripts typically uses torchrun.

Documentation: https://docs.nvidia.com/nemo/megatron-bridge/latest/
Examples: https://github.com/NVIDIA-NeMo/Megatron-Bridge/tree/main/examples

Highlighted Details

Bidirectional Conversion: Seamlessly converts checkpoints between Hugging Face and Megatron formats, supporting online import/export with memory-efficient streaming.
Advanced Parallelism: Integrates Megatron Core's parallelism (TP/PP/VPP/CP/EP/ETP) and supports mixed-precision training (FP8, BF16, FP4).
Flexible Training: Offers a customizable PyTorch-native training loop for fine-grained control over data loading, distributed training, and evaluation.
PEFT & SFT: Implements Supervised Fine-Tuning and Parameter-Efficient Fine-Tuning methods like LoRA and DoRA.
SOTA Recipes: Provides production-ready training recipes for popular LLMs (e.g., Llama 3, Qwen2.5) with optimized configurations.
Performance: Engineered for high utilization and near-linear scalability across thousands of nodes.

Maintenance & Community

The project is a continuation of MBridge and has seen adoption by several organizations including veRL, slime, SkyRL, and Nemo-RL. Community contributions are acknowledged.

Contributing: https://github.com/NVIDIA-NeMo/Megatron-Bridge/blob/main/CONTRIBUTING.md

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README, which may impact commercial use or integration decisions.

Limitations & Caveats

The "Supported Models" table indicates that full training/fine-tuning recipes or checkpoint conversion are "Coming soon" for several models, suggesting incomplete support for certain architectures. Installation primarily relies on a Docker container.

Health Check

Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)

395

Issues (30d)

Star History

60 stars in the last 30 days