lm-council by machine-theory

LLM council for democratic AI benchmarking

Created 1 year ago

287 stars

Top 91.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Magnus Müller

Cofounder of Browser Use

Project Summary

Summary

This project introduces a novel framework for evaluating Large Language Models (LLMs) by enabling them to form a "council" and democratically decide consensus on subjective prompts. It targets researchers and practitioners grappling with the limitations of human-curated benchmarks and the inherent biases of individual LLMs, offering a decentralized approach to self-assessment.

How It Works

The core mechanism involves deploying multiple LLMs to collectively judge and elect a "best" model for a given prompt, mimicking a democratic process. This approach leverages LLM-as-a-Judge capabilities in a decentralized, consensus-driven manner, aiming to overcome the subjectivity and value-laden nature of traditional LLM evaluations.

Quick Start & Requirements

Installation is straightforward via pip: pip install lm-council. A prerequisite is configuring an OpenRouter API key in a .env file. The library supports running councils on single or multiple prompts in parallel, with options to save and load council states. Official resources include a website (https://llm-council.com), dataset (https://huggingface.co/datasets/llm-council/emotional_application), paper (https://arxiv.org/abs/2406.08598), talk recording (https://youtu.be/hI0XCE27QqE), and slides (https://bit.ly/44XSEnh).

Highlighted Details

Pioneers LLM-as-a-Judge in a democratic setting for highly subjective tasks.
Features a case study benchmarking 20 LLMs on emotional intelligence.
Paper accepted to NAACL 2025 Main Conference.
Utilizes OpenRouter for unified API access across diverse models.

Maintenance & Community

The project is associated with authors Justin Zhao, Flor Miriam Plaza-del-Arco, Benjamin Genchel, and Amanda Cercas Curry. No specific community channels (e.g., Discord, Slack) or explicit roadmap details are provided in the README.

Licensing & Compatibility

The specific open-source license for this repository is not explicitly stated in the provided README text.

Limitations & Caveats

The system's functionality is dependent on the OpenRouter API for model access. The project appears research-oriented, stemming from a specific paper, and may not represent a fully generalized or production-ready evaluation suite without further development or adaptation.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days