NeMo-Aligner  by NVIDIA

Toolkit for efficient model alignment

created 1 year ago
833 stars

Top 43.6% on sourcepulse

GitHubView on GitHub
Project Summary

NVIDIA NeMo-Aligner is a scalable toolkit designed for efficient model alignment, enabling users to make language models safer, more helpful, and harmless. It supports advanced alignment techniques like SteerLM, Direct Preference Optimization (DPO), and Reinforcement Learning from Human Feedback (RLHF), targeting researchers and developers working with large language models.

How It Works

NeMo-Aligner leverages the NeMo Framework for distributed training across thousands of GPUs, utilizing tensor, data, and pipeline parallelism. This architecture ensures performant and resource-efficient alignment, even for large models. The toolkit integrates state-of-the-art algorithms, including SteerLM for attribute-conditioned fine-tuning and RLHF via PPO or REINFORCE, with recent support for TensorRT-LLM for accelerated generation in RLHF pipelines.

Quick Start & Requirements

  • Installation: NeMo-Aligner is included in the official NeMo container (nvcr.io/nvidia/nemo:24.07). Once inside the container, it's pre-installed. Alternatively, install NeMo Toolkit and then run pip install nemo-aligner or pip install . for the latest commit.
  • Prerequisites: Requires NVIDIA GPUs, drivers, and the NeMo Toolkit. PyTriton is an additional requirement.
  • Resources: Utilizes NVIDIA's NeMo, Megatron-LM, and TransformerEngine.
  • Documentation: NeMo-Aligner Paper, RLHF Documentation.

Highlighted Details

  • Supports SteerLM, DPO, RLHF (PPO, REINFORCE), and Self-Play Fine-Tuning (SPIN).
  • Demonstrated alignment of Llama3-70B and Nemotron-70B models.
  • Accelerated generation support via TensorRT-LLM for RLHF.
  • Checkpoints are cross-compatible with the NeMo ecosystem.

Maintenance & Community

  • Actively developed by NVIDIA.
  • Community contributions are welcomed via CONTRIBUTING.md.
  • Paper available on arXiv: 2405.01481.

Licensing & Compatibility

  • Licensed under the Apache License, Version 2.0.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The toolkit is described as being in its early stages, with ongoing efforts to improve stability, particularly in the PPO learning phase, and enhance RLHF performance.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
0
Star History
67 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 2 days ago
Feedback? Help us improve.