LLM-Blender  by yuchenlin

LLM ensembling framework using pairwise ranking and generative fusion

created 2 years ago
956 stars

Top 39.3% on sourcepulse

GitHubView on GitHub
Project Summary

LLM-Blender is an ensembling framework designed to improve Large Language Model (LLM) performance by combining the strengths of multiple models. It targets researchers and developers seeking to enhance LLM output quality through sophisticated ranking and generation fusion techniques. The framework offers a novel approach to LLM evaluation and optimization.

How It Works

LLM-Blender employs two core modules: PairRanker and GenFuser. PairRanker uses pairwise comparisons to discern subtle differences in LLM outputs, identifying the best responses for specific inputs. GenFuser then merges these top-ranked candidates, capitalizing on their collective strengths to produce a superior, fused output. This dual approach addresses the variability in LLM performance across different tasks and examples.

Quick Start & Requirements

  • Installation: pip install llm-blender or pip install git+https://github.com/yuchenlin/LLM-Blender.git
  • Prerequisites: Python, PyTorch, Hugging Face Transformers. GPU recommended for performance.
  • Usage: See example Jupyter notebook.

Highlighted Details

  • Introduces the MixInstruct benchmark dataset for large-scale LLM evaluation with oracle pairwise comparisons.
  • PairRM, a small (0.4B) pairwise reward model, approaches GPT-4's alignment with human preference.
  • Supports Best-of-N sampling for improved response quality and direct integration with RLHF toolkits.
  • Enables Direct Preference Optimization (DPO) using its compare functionality.

Maintenance & Community

  • The project is associated with ACL2023 and AI2-Mosaic, USC-INK.
  • PairRM is used in projects like snorkelai/Snorkel-Mistral-PairRM-DPO.
  • Community contributions are welcomed.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but usage examples suggest compatibility with Hugging Face models and common Python environments. Further clarification on licensing is recommended for commercial use.

Limitations & Caveats

  • The README does not explicitly state the license, which could be a concern for commercial adoption. Training scripts and dataset construction code are provided, but require careful setup.
Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.