NeMo-Framework-Launcher by NVIDIA

Cloud-native tool for launching NeMo framework training jobs

Created 3 years ago

509 stars

Top 61.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chuan Li

Chief Scientific Officer at Lambda

Project Summary

The NeMo Framework Launcher is a cloud-native tool designed for launching end-to-end training pipelines for Large Language Models (LLMs) and multimodal foundation models. It targets researchers and engineers working with generative AI, simplifying the complex process of large-scale model training on diverse compute environments, from on-premises clusters to cloud platforms.

How It Works

The launcher orchestrates the entire LLM training lifecycle, including data preparation, model parallelism configuration, training, fine-tuning (SFT, PEFT), evaluation, and export. It leverages advanced training techniques such as Tensor Parallelism, Pipeline Parallelism, Sequence Parallelism, Distributed Optimizer, and mixed-precision training (FP8, BF16) to enable efficient scaling to thousands of GPUs for training on trillions of tokens. The tool generates and manages submission scripts for cluster schedulers, organizes job results, and supports custom container images.

Quick Start & Requirements

Install: git clone https://github.com/NVIDIA/NeMo-Framework-Launcher.git && cd NeMo-Framework-Launcher && pip install -r requirements.txt
Prerequisites: Python, compatible with NeMo version 1.0. Tested with NeMo Framework Container.
Usage: Configure .yaml files and run python main.py.
Docs: NeMo Launcher Guide, NeMo Framework Playbooks

Highlighted Details

Supports LLM pretraining and fine-tuning (SFT, PEFT) for models like GPT, BERT, and T5/MT5.
Scales training to thousands of GPUs and trillions of tokens.
Integrates advanced parallelism techniques (Tensor, Pipeline, Sequence, Distributed Optimizer).
Facilitates cluster setup, data management, and model deployment.

Maintenance & Community

Contributions are accepted via pull requests. Further community engagement details are not specified in the README.

Licensing & Compatibility

License: Apache 2.0 License.
Compatibility: Compatible with NeMo version 1.0 only. Designed for cloud-native and on-premises cluster deployment.

Limitations & Caveats

The launcher is strictly compatible with NeMo version 1.0, which may limit its applicability to users of newer NeMo Framework versions.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days