All-modality alignment framework for training models with feedback
Top 11.4% on sourcepulse
Align-Anything is a modular framework for aligning large language models across various modalities (text, image, audio, video) with human intentions. It targets researchers and developers seeking to fine-tune multi-modal models using techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), offering a flexible platform for custom alignment tasks.
How It Works
The framework supports an "any-to-any" model alignment approach, allowing for diverse input and output modalities. It implements multiple alignment algorithms (SFT, DPO, PPO, GRPO, SimPO, KTO) and is designed for modularity, enabling easy customization and extension for new tasks and models. The project also includes a multi-modal CLI and supports O1-like training and rule-based RL.
Quick Start & Requirements
pip install -e .
for Nvidia GPU, pip install -e .[ascend]
for Huawei Ascend NPU.vllm
is recommended for accelerated PPO training.Highlighted Details
Maintenance & Community
The project is actively developed by the PKU-Alignment Team. Updates are frequent, with recent additions including support for new models (Emu3, MiniCPM-o, Janus) and alignment methods (GRPO). The project encourages reporting issues on GitHub.
Licensing & Compatibility
Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
While extensive, some modalities (e.g., RM for Text -> Image/Video) are marked as "work in progress" (⚒️). Environment compatibility for Ascend NPUs requires adherence to specific CANN and driver versions, with potential debugging needed for other configurations.
2 months ago
1 week