LLM-Blender by yuchenlin

LLM ensembling framework using pairwise ranking and generative fusion

Created 2 years ago

972 stars

Top 38.0% on SourcePulse

8 Experts Love This Project

winglian

Founder of Axolotl AI

pgarbacki

Cofounder of Fireworks AI

mlabonne

Head of Post-Training at Liquid AI

chiphuyen

Author of "AI Engineering", "Designing Machine Learning Systems"

and 4 more!

Project Summary

LLM-Blender is an ensembling framework designed to improve Large Language Model (LLM) performance by combining the strengths of multiple models. It targets researchers and developers seeking to enhance LLM output quality through sophisticated ranking and generation fusion techniques. The framework offers a novel approach to LLM evaluation and optimization.

How It Works

LLM-Blender employs two core modules: PairRanker and GenFuser. PairRanker uses pairwise comparisons to discern subtle differences in LLM outputs, identifying the best responses for specific inputs. GenFuser then merges these top-ranked candidates, capitalizing on their collective strengths to produce a superior, fused output. This dual approach addresses the variability in LLM performance across different tasks and examples.

Quick Start & Requirements

Installation: pip install llm-blender or pip install git+https://github.com/yuchenlin/LLM-Blender.git
Prerequisites: Python, PyTorch, Hugging Face Transformers. GPU recommended for performance.
Usage: See example Jupyter notebook.

Highlighted Details

Introduces the MixInstruct benchmark dataset for large-scale LLM evaluation with oracle pairwise comparisons.
PairRM, a small (0.4B) pairwise reward model, approaches GPT-4's alignment with human preference.
Supports Best-of-N sampling for improved response quality and direct integration with RLHF toolkits.
Enables Direct Preference Optimization (DPO) using its compare functionality.

Maintenance & Community

The project is associated with ACL2023 and AI2-Mosaic, USC-INK.
PairRM is used in projects like snorkelai/Snorkel-Mistral-PairRM-DPO.
Community contributions are welcomed.

Licensing & Compatibility

The specific license is not explicitly stated in the README, but usage examples suggest compatibility with Hugging Face models and common Python environments. Further clarification on licensing is recommended for commercial use.

Limitations & Caveats

The README does not explicitly state the license, which could be a concern for commercial adoption. Training scripts and dataset construction code are provided, but require careful setup.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

1

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

pyversity by Pringled

Fast library for retrieval result diversification

Created 3 months ago

Updated 1 month ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

1 more.

deita by hkust-nlp

Data-efficient instruction tuning for LLM alignment (ICLR 2024)

Created 2 years ago

Updated 1 year ago

turbo-alignment by turbo-llm

Library for LLM industrial alignment

Created 1 year ago

Updated 3 months ago

RAGLAB by fate-ubw

RAG framework for research, modularity, and reproducibility

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI), and

1 more.

MixEval by JinjieNi

Dynamic LLM evaluation suite for accurate, cost-effective benchmarking

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Wei-Lin Chiang

Wei-Lin Chiang(Cofounder of LMArena).

instruct-eval by declare-lab

Evaluation code for instruction-tuned LLMs

Created 2 years ago

Updated 1 year ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Eugene Yan

Eugene Yan(AI Scientist at AWS), and

2 more.

Platypus by arielnlee

Code for fine-tuning LLMs using LoRA

Created 2 years ago

Updated 1 year ago

Starred by

Andrey Vasnetsov

Andrey Vasnetsov(Cofounder of Qdrant).

FlashRank by PrithivirajDamodaran

Reranking library for search & retrieval pipelines

Created 2 years ago

Updated 1 week ago

Starred by

Omar Khattab

Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT).

Rankify by DataScienceUIBK

Python toolkit for retrieval, re-ranking, and RAG research

Created 11 months ago

Updated 2 months ago

Starred by

Jeff Huber

Jeff Huber(Cofounder of Chroma) and

Casper Hansen

Casper Hansen(Author of AutoAWQ).

rank_llm by castorini

Python toolkit for reproducible information retrieval research

Created 2 years ago

Updated 2 weeks ago

ditto by megagonlabs

Entity matching solution using pre-trained language models

Created 5 years ago

Updated 1 year ago

open-unlearning by locuslab

LLM unlearning framework for unifying evaluation benchmarks

Created 2 years ago

Updated 2 weeks ago

Feedback? Help us improve.