align-anything  by PKU-Alignment

All-modality alignment framework for training models with feedback

created 1 year ago
4,387 stars

Top 11.4% on sourcepulse

GitHubView on GitHub
Project Summary

Align-Anything is a modular framework for aligning large language models across various modalities (text, image, audio, video) with human intentions. It targets researchers and developers seeking to fine-tune multi-modal models using techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), offering a flexible platform for custom alignment tasks.

How It Works

The framework supports an "any-to-any" model alignment approach, allowing for diverse input and output modalities. It implements multiple alignment algorithms (SFT, DPO, PPO, GRPO, SimPO, KTO) and is designed for modularity, enabling easy customization and extension for new tasks and models. The project also includes a multi-modal CLI and supports O1-like training and rule-based RL.

Quick Start & Requirements

  • Installation: pip install -e . for Nvidia GPU, pip install -e .[ascend] for Huawei Ascend NPU.
  • Dependencies: Python 3.11 recommended. CUDA 12.2.0 for Nvidia GPUs. Specific CANN versions for Ascend NPU. vllm is recommended for accelerated PPO training.
  • Resources: Setup involves cloning the repo and installing dependencies. Training scripts handle model and dataset downloads.
  • Documentation: Official Documentation

Highlighted Details

  • Supports a wide range of modalities including Text, Image, Audio, and Video for various input/output combinations.
  • Integrates with vLLM for significant PPO training acceleration (e.g., 22 mins vs. 150 mins).
  • Offers support for both Nvidia GPUs and Huawei Ascend NPUs, including pre-configured Docker images for Ascend.
  • Includes example scripts for training and evaluation, with support for Slurm clusters.

Maintenance & Community

The project is actively developed by the PKU-Alignment Team. Updates are frequent, with recent additions including support for new models (Emu3, MiniCPM-o, Janus) and alignment methods (GRPO). The project encourages reporting issues on GitHub.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While extensive, some modalities (e.g., RM for Text -> Image/Video) are marked as "work in progress" (⚒️). Environment compatibility for Ascend NPUs requires adherence to specific CANN and driver versions, with potential debugging needed for other configurations.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
841 stars in the last 90 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), Nathan Lambert Nathan Lambert(AI Researcher at AI2), and
1 more.

unified-io-2 by allenai

0.5%
619
Unified-IO 2 code for training, inference, and demo
created 1 year ago
updated 1 year ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.3%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.