Pai-Megatron-Patch  by alibaba

Training toolkit for LLMs & VLMs using Megatron

Created 2 years ago
1,344 stars

Top 29.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Pai-Megatron-Patch, a toolkit for efficiently training and predicting Large Language Models (LLMs) and Vision-Language Models (VLMs) using the Megatron framework. It targets developers seeking to optimize GPU utilization for large-scale models, offering accelerated training techniques and broad model compatibility.

How It Works

Pai-Megatron-Patch applies a "patch" philosophy, extending Megatron-LM's capabilities without invasive source code modifications. This approach ensures compatibility with Megatron-LM upgrades. It includes a model library with implementations of popular LLMs, bidirectional weight converters for Huggingface and Megatron formats, and supports FP8 acceleration via Flash Attention 2.0 and Transformer Engine.

Quick Start & Requirements

  • Installation and usage details are available via the "Quick Start" link in the README.
  • Requires a robust GPU environment, likely with CUDA support, for large-scale training. Specific model training may have additional dependencies.
  • Refer to the official documentation for detailed setup and examples.

Highlighted Details

  • Supports a wide range of LLMs including Llama, Qwen, Mistral, DeepSeek, and more.
  • Facilitates bidirectional weight conversion between Huggingface and Megatron formats.
  • Offers FP8 training acceleration with Flash Attention 2.0 and Transformer Engine.
  • Includes PPO training workflows for reinforcement learning.

Maintenance & Community

  • Developed by Alibaba Cloud's Machine Learning Platform (PAI) algorithm team.
  • Contact information via DingTalk QR code is provided.

Licensing & Compatibility

  • Licensed under the Apache License (Version 2.0).
  • May contain code from other repositories under different open-source licenses; consult the NOTICE file.

Limitations & Caveats

  • Some features are marked as experimental, such as distributed checkpoint conversion.
  • The README indicates specific model support via links, suggesting a modular or evolving integration strategy.
Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
10
Issues (30d)
17
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.4%
265
Efficiently train foundation models with PyTorch
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.