align-anything  by PKU-Alignment

All-modality alignment framework for training models with feedback

Created 1 year ago
4,549 stars

Top 10.8% on SourcePulse

GitHubView on GitHub
Project Summary

Align-Anything is a modular framework for aligning large language models across various modalities (text, image, audio, video) with human intentions. It targets researchers and developers seeking to fine-tune multi-modal models using techniques like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Proximal Policy Optimization (PPO), offering a flexible platform for custom alignment tasks.

How It Works

The framework supports an "any-to-any" model alignment approach, allowing for diverse input and output modalities. It implements multiple alignment algorithms (SFT, DPO, PPO, GRPO, SimPO, KTO) and is designed for modularity, enabling easy customization and extension for new tasks and models. The project also includes a multi-modal CLI and supports O1-like training and rule-based RL.

Quick Start & Requirements

  • Installation: pip install -e . for Nvidia GPU, pip install -e .[ascend] for Huawei Ascend NPU.
  • Dependencies: Python 3.11 recommended. CUDA 12.2.0 for Nvidia GPUs. Specific CANN versions for Ascend NPU. vllm is recommended for accelerated PPO training.
  • Resources: Setup involves cloning the repo and installing dependencies. Training scripts handle model and dataset downloads.
  • Documentation: Official Documentation

Highlighted Details

  • Supports a wide range of modalities including Text, Image, Audio, and Video for various input/output combinations.
  • Integrates with vLLM for significant PPO training acceleration (e.g., 22 mins vs. 150 mins).
  • Offers support for both Nvidia GPUs and Huawei Ascend NPUs, including pre-configured Docker images for Ascend.
  • Includes example scripts for training and evaluation, with support for Slurm clusters.

Maintenance & Community

The project is actively developed by the PKU-Alignment Team. Updates are frequent, with recent additions including support for new models (Emu3, MiniCPM-o, Janus) and alignment methods (GRPO). The project encourages reporting issues on GitHub.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While extensive, some modalities (e.g., RM for Text -> Image/Video) are marked as "work in progress" (⚒️). Environment compatibility for Ascend NPUs requires adherence to specific CANN and driver versions, with potential debugging needed for other configurations.

Health Check
Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Zack Li Zack Li(Cofounder of Nexa AI), and
19 more.

LLaVA by haotian-liu

0.2%
24k
Multimodal assistant with GPT-4 level capabilities
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.