NeMo-Aligner by NVIDIA

Toolkit for efficient model alignment

Created 2 years ago

848 stars

Top 42.1% on SourcePulse

View on GitHub

6 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Thomas Wolf

Cofounder of Hugging Face

Lewis Tunstall

Research Engineer at Hugging Face

Jiayi Pan

Author of SWE-Gym; MTS at xAI

and 2 more!

Project Summary

NVIDIA NeMo-Aligner is a scalable toolkit designed for efficient model alignment, enabling users to make language models safer, more helpful, and harmless. It supports advanced alignment techniques like SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF), targeting researchers and developers working with large language models.

How It Works

NeMo-Aligner leverages the NeMo Framework for distributed training across thousands of GPUs, utilizing tensor, data, and pipeline parallelism. This architecture ensures performant and resource-efficient alignment, even for large models. The toolkit integrates state-of-the-art algorithms, including SteerLM for attribute-conditioned fine-tuning and RLHF via PPO or REINFORCE, with recent support for TensorRT-LLM for accelerated generation in RLHF pipelines.

Quick Start & Requirements

Installation: NeMo-Aligner is included in the official NeMo container (nvcr.io/nvidia/nemo:24.07). Once inside the container, it's pre-installed. Alternatively, install NeMo Toolkit and then run pip install nemo-aligner or pip install . for the latest commit.
Prerequisites: Requires NVIDIA GPUs, drivers, and the NeMo Toolkit. PyTriton is an additional requirement.
Resources: Utilizes NVIDIA's NeMo, Megatron-LM, and TransformerEngine.
Documentation: NeMo-Aligner Paper, RLHF Documentation.

Highlighted Details

Supports SteerLM, DPO, RLHF (PPO, REINFORCE), and Self-Play Fine-Tuning (SPIN).
Demonstrated alignment of Llama3-70B and Nemotron-70B models.
Accelerated generation support via TensorRT-LLM for RLHF.
Checkpoints are cross-compatible with the NeMo ecosystem.

Maintenance & Community

Actively developed by NVIDIA.
Community contributions are welcomed via CONTRIBUTING.md.
Paper available on arXiv: 2405.01481.

Licensing & Compatibility

Licensed under the Apache License, Version 2.0.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The toolkit is described as being in its early stages, with ongoing efforts to improve stability, particularly in the PPO learning phase, and enhance RLHF performance.

Health Check

Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days