Discover and explore top open-source AI tools and projects—updated daily.
princeton-nlpPreference optimization algorithm for LLMs (NeurIPS 2024 paper)
Top 39.1% on SourcePulse
SimPO (Simple Preference Optimization) is a novel preference optimization algorithm for large language models that achieves state-of-the-art performance without relying on a reference model, simplifying the training process and reducing computational overhead. It is designed for researchers and practitioners aiming to enhance LLM alignment and instruction-following capabilities.
How It Works
SimPO introduces a reference-free reward formulation that directly optimizes the policy against a set of preferred and dispreferred responses. This approach avoids the complexity and potential biases associated with maintaining a separate reference model, leading to a more streamlined and efficient training pipeline. The core innovation lies in its ability to learn a reward signal implicitly from preference data, enabling direct policy updates.
Quick Start & Requirements
alignment-handbook repository and install dependencies using pip install .. Requires PyTorch v2.2.2 and Flash Attention 2.accelerate for distributed training. Example commands provided for Mistral and Llama3 models.Highlighted Details
learning_rate, beta, and gamma_beta_ratio being key.Maintenance & Community
Licensing & Compatibility
alignment-handbook, which typically uses Apache 2.0. Specific license for SimPO itself is not explicitly stated in the README but is expected to be permissive for research and commercial use.Limitations & Caveats
alpaca-eval==0.6.2) due to recent changes in the evaluation library.10 months ago
Inactive
xfactlab
uclaml
eric-mitchell
imoneoi
hiyouga