turbo-alignment  by turbo-llm

Library for LLM industrial alignment

created 1 year ago
398 stars

Top 73.7% on sourcepulse

GitHubView on GitHub
Project Summary

Turbo-Alignment is a Python library designed for industrial-scale fine-tuning and alignment of large language models. It targets ML engineers and researchers seeking efficient, end-to-end pipelines for tasks like Supervised Fine-Tuning (SFT), Reward Modeling (RM), and Direct Preference Optimization (DPO), offering streamlined deployment of new methods and comprehensive logging.

How It Works

The library provides an end-to-end pipeline from data preprocessing to model alignment, supporting various alignment methods including SFT, RM, Offline Preference Optimization, and Online Preference Optimization. It integrates with vLLM for fast inference and includes a wide array of metrics like Self-BLEU, KL divergence, and diversity for comprehensive evaluation.

Quick Start & Requirements

  • Install via pip: pip install turbo-alignment
  • For latest features: pip install git+https://github.com/turbo-llm/turbo-alignment.git
  • Development setup requires poetry install.
  • Requires datasets formatted as ChatDataset or PairPreferencesDataset.
  • Official guides and tutorials are available.

Highlighted Details

  • Supports Supervised Fine-Tuning, Reward Modeling, Offline and Online Preference Optimization.
  • Implements metrics: Accuracy, Distinctness, Diversity, Self-BLEU, KL-divergence, Reward, Length, Perplexity.
  • Optimized for fast inference using vLLM.
  • End-to-end pipelines from data preprocessing to model alignment.

Maintenance & Community

The project references implementations from Hugging Face's TRL, AllenNLP, and LinkedIn's Liger-Kernel. Contribution guidelines and a development environment setup are provided.

Licensing & Compatibility

The project is licensed under a specific license detailed in the LICENSE file. Compatibility for commercial use or closed-source linking is not explicitly detailed.

Limitations & Caveats

The roadmap indicates that Online RL methods (PPO, Reinforce), distributed training, and low-memory training approaches are still in progress.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.