Stable-Alignment  by agi-templar

RLHF alternative for training socially aligned language models

Created 2 years ago
353 stars

Top 79.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an alternative to Reinforcement Learning from Human Feedback (RLHF) for aligning language models, focusing on efficiency and stability. It targets researchers and developers seeking to train socially aligned LLMs by leveraging simulated human society interactions. The core benefit is a potentially more robust and less gameable alignment process.

How It Works

The project bypasses traditional reward modeling by directly training on interaction data generated within a simulated social environment ("Sandbox"). This approach uses multi-agent simulations where language models act as social agents, interacting and generating data. This data is then used for alignment training, aiming for higher quality and stability compared to RLHF.

Quick Start & Requirements

  • Install: pip install -r requirements.txt and pip install -e .
  • Prerequisites: Python, Git LFS, OpenAI API key (placed in .env).
  • Simulation: Requires text-davinci-002/003 and gpt-3.5-turbo or GPT4 for simulation agents.
  • Training: Requires a supervised fine-tuned (SFT) model and uses torchrun with FSDP. BF16 support is recommended.
  • Inference: Requires a downloaded model and PyTorch.
  • Data: Includes assets/sandbox_v1.json (93.8k samples) and assets/sandbox_v2.json (169k samples). Full dataset available upon request.
  • Docs: Official Paper

Highlighted Details

  • Offers a "So(cially)-Good Language Model" trained via the Stable Alignment method.
  • Provides code for running multi-agent social simulations in "Sandbox".
  • Released models include better-base, hh-rlhf-sft, and socially-good-lm.
  • Training utilizes a cosine learning rate scheduler with weight decay and warmup.

Maintenance & Community

The project is associated with the paper "Training Socially Aligned Language Models in Simulated Human Society" by Liu et al. (2023). No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code and data are presented for research purposes, and commercial use would require clarification.

Limitations & Caveats

The project relies heavily on OpenAI's API for simulation agents, incurring costs and external dependencies. The "Stable Alignment" method's generalizability and robustness beyond the described simulations require further validation. The training process requires significant computational resources and specific FSDP configurations.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

RL4LMs by allenai

0.0%
2k
RL library to fine-tune language models to human preferences
Created 3 years ago
Updated 1 year ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
19 more.

trlx by CarperAI

0.0%
5k
Distributed RLHF for LLMs
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.