Automodel  by NVIDIA-NeMo

PyTorch-native SPMD library for LLM/VLM training

Created 8 months ago
256 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
Project Summary

NeMo AutoModel is an open-source PyTorch training library designed to streamline and scale the training and fine-tuning of Large Language Models (LLMs) and Vision-Language Models (VLMs). It targets researchers and engineers, enabling rapid experimentation from small-scale setups to massive multi-GPU, multi-node deployments. The library offers flexibility, reproducibility, and high performance with minimal ceremony, featuring seamless integration with Hugging Face models.

How It Works

The core innovation lies in its PyTorch Distributed native SPMD (Single Program Multiple Data) approach, leveraging DTensor for parallelism. This "one program, any scale" philosophy allows a single training script to run across varying hardware configurations by simply adjusting the distributed mesh. Parallelism strategies (tensor, sequence, data) are defined in configuration files rather than requiring model code rewrites, decoupling model logic from parallel execution. This composable and portable design simplifies scaling up, changing strategies, and reasoning about failure modes.

Quick Start & Requirements

Highlighted Details

  • SPMD Parallelism: Configuration-driven parallelism (FSDP2, TP, CP, SP, Pipeline, HSDP) without model code modification.
  • Hugging Face Integration: Native support for a vast array of LLMs and VLMs from the Hugging Face Hub.
  • Training Capabilities: Supports LLM/VLM pre-training, Supervised Fine-Tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT), and Knowledge Distillation.
  • Performance: Demonstrates high training throughput on NVIDIA GPUs, with optimizations like FP8 support and sequence packing.
  • Interoperability: Integrates with NeMo RL, Hugging Face, and offers Megatron Bridge conversions.

Maintenance & Community

The project is under active development, with regular updates and a roadmap towards a stable release. Contributions are welcomed via the provided contributing guide. Recent news highlights new model support and technical advancements. No explicit community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

Licensed under the Apache License 2.0, which is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

NeMo AutoModel is actively under development, and users should expect the interface to evolve as the project moves towards a stable release. New features and improvements are continuously being added.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
71
Issues (30d)
42
Star History
38 stars in the last 30 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0.7%
280
Efficiently train foundation models with PyTorch
Created 2 years ago
Updated 2 months ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0%
417
Lightweight training framework for model pre-training
Created 2 years ago
Updated 5 months ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
20 more.

accelerate by huggingface

0.2%
9k
PyTorch training helper for distributed execution
Created 5 years ago
Updated 1 week ago
Feedback? Help us improve.