align-anything by PKU-Alignment

All-modality alignment framework for training models with feedback

Created 1 year ago

4,616 stars

Top 10.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Junyang Lin

Core Maintainer at Alibaba Qwen

Project Summary

Align-Anything is a modular framework for aligning large language models across various modalities (text, image, audio, video) with human intentions. It targets researchers and developers seeking to fine-tune multi-modal models using techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), offering a flexible platform for custom alignment tasks.

How It Works

The framework supports an "any-to-any" model alignment approach, allowing for diverse input and output modalities. It implements multiple alignment algorithms (SFT, DPO, PPO, GRPO, SimPO, KTO) and is designed for modularity, enabling easy customization and extension for new tasks and models. The project also includes a multi-modal CLI and supports O1-like training and rule-based RL.

Quick Start & Requirements

Installation: pip install -e . for Nvidia GPU, pip install -e .[ascend] for Huawei Ascend NPU.
Dependencies: Python 3.11 recommended. CUDA 12.2.0 for Nvidia GPUs. Specific CANN versions for Ascend NPU. vllm is recommended for accelerated PPO training.
Resources: Setup involves cloning the repo and installing dependencies. Training scripts handle model and dataset downloads.
Documentation: Official Documentation

Highlighted Details

Supports a wide range of modalities including Text, Image, Audio, and Video for various input/output combinations.
Integrates with vLLM for significant PPO training acceleration (e.g., 22 mins vs. 150 mins).
Offers support for both Nvidia GPUs and Huawei Ascend NPUs, including pre-configured Docker images for Ascend.
Includes example scripts for training and evaluation, with support for Slurm clusters.

Maintenance & Community

The project is actively developed by the PKU-Alignment Team. Updates are frequent, with recent additions including support for new models (Emu3, MiniCPM-o, Janus) and alignment methods (GRPO). The project encourages reporting issues on GitHub.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While extensive, some modalities (e.g., RM for Text -> Image/Video) are marked as "work in progress" (⚒️). Environment compatibility for Ascend NPUs requires adherence to specific CANN and driver versions, with potential debugging needed for other configurations.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

18 stars in the last 30 days