Pai-Megatron-Patch by alibaba

Training toolkit for LLMs & VLMs using Megatron

Created 2 years ago

1,504 stars

Top 27.3% on SourcePulse

2 Experts Love This Project

zhyncs

Inference Lead at SGLang; Research Scientist at Together AI

Jiayi-Pan

Author of SWE-Gym; MTS at xAI

Project Summary

This repository provides Pai-Megatron-Patch, a toolkit for efficiently training and predicting Large Language Models (LLMs) and Vision-Language Models (VLMs) using the Megatron framework. It targets developers seeking to optimize GPU utilization for large-scale models, offering accelerated training techniques and broad model compatibility.

How It Works

Pai-Megatron-Patch applies a "patch" philosophy, extending Megatron-LM's capabilities without invasive source code modifications. This approach ensures compatibility with Megatron-LM upgrades. It includes a model library with implementations of popular LLMs, bidirectional weight converters for Huggingface and Megatron formats, and supports FP8 acceleration via Flash Attention 2.0 and Transformer Engine.

Quick Start & Requirements

Installation and usage details are available via the "Quick Start" link in the README.
Requires a robust GPU environment, likely with CUDA support, for large-scale training. Specific model training may have additional dependencies.
Refer to the official documentation for detailed setup and examples.

Highlighted Details

Supports a wide range of LLMs including Llama, Qwen, Mistral, DeepSeek, and more.
Facilitates bidirectional weight conversion between Huggingface and Megatron formats.
Offers FP8 training acceleration with Flash Attention 2.0 and Transformer Engine.
Includes PPO training workflows for reinforcement learning.

Maintenance & Community

Developed by Alibaba Cloud's Machine Learning Platform (PAI) algorithm team.
Contact information via DingTalk QR code is provided.

Licensing & Compatibility

Licensed under the Apache License (Version 2.0).
May contain code from other repositories under different open-source licenses; consult the NOTICE file.

Limitations & Caveats

Some features are marked as experimental, such as distributed checkpoint conversion.
The README indicates specific model support via links, suggesting a modular or evolving integration strategy.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

1

Issues (30d)

6

Star History

42 stars in the last 30 days

Explore Similar Projects

LLM-RLHF-Tuning by Joyce94

LLM tuning via RLHF (SFT+RM+PPO+DPO) with LoRA

Created 2 years ago

Updated 2 years ago

MegaDLMs by JinjieNi

Accelerate diffusion language model training at any scale with GPU optimization

Created 2 months ago

Updated 2 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

InfiniTransformer by Beomi

PyTorch implementation of Infini-attention for efficient, infinite context Transformers

Created 1 year ago

Updated 1 year ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind), and

1 more.

mergoo by Leeroo-AI

Library for merging LLM experts and training merged models

Created 1 year ago

Updated 1 year ago

llava-phi by xmoanvaf

Multimodal assistant with small language models

Created 2 years ago

Updated 1 year ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI), and

1 more.

EasyDeL by erfanzar

Framework for streamlined JAX model training/serving on TPU/GPU

Created 2 years ago

Updated 4 days ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

Efficiently train foundation models with PyTorch

Created 1 year ago

Updated 1 month ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

megalodon by XuezheMax

Reference implementation for Megalodon 7B model

Created 1 year ago

Updated 7 months ago

LLM-Dojo by mst272

LLM training framework for model training and RLHF

Created 1 year ago

Updated 1 month ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

Megatron-Bridge by NVIDIA-NeMo

Scalable LLM training and conversion between Hugging Face and Megatron Core

Created 7 months ago

Updated 14 hours ago

Starred by

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama).

transformerlab-app by transformerlab

Open-source app for LLM experimentation

Created 2 years ago

Updated 2 days ago

Starred by

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen).

Firefly by yangjianxin1

LLM training tool for Qwen2.5, Llama3, Gemma, and other models

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.