LLM benchmark for social reasoning, strategy, and deception
Top 93.1% on sourcepulse
This repository introduces the Elimination Game, a multi-player benchmark designed to evaluate Large Language Models (LLMs) on social reasoning, strategy, and deception. It pits 8 LLMs against each other in rounds of public and private communication, alliance formation, and voting to eliminate peers, culminating in a jury vote for the winner. The benchmark aims to uncover how LLMs navigate complex social dynamics and strategic decision-making.
How It Works
The game simulates a tournament where LLMs act as players. Each round involves a public subround for open communication, preference rankings for private pairings, private subrounds for alliance discussions, and anonymous voting for elimination. Tie-breaking mechanisms are in place for close votes. The core innovation lies in its multi-faceted evaluation of LLM capabilities beyond simple task completion, focusing on emergent behaviors in social contexts, strategic foresight, and the ability to manage hidden intentions and alliances.
Quick Start & Requirements
The README does not provide specific installation or execution commands, nor does it detail explicit software or hardware requirements beyond the implicit need to run LLMs.
Highlighted Details
Maintenance & Community
The project is maintained by lechmazur, who can be followed on GitHub for updates. No specific community channels (Discord, Slack) or roadmap details are provided in the README.
Licensing & Compatibility
The README does not specify a license.
Limitations & Caveats
The README lacks explicit installation instructions, dependency lists, and licensing information, which are critical for adoption and reproducibility. The benchmark's complexity and reliance on LLM API access or local hosting imply significant computational resources and setup effort.
2 weeks ago
Inactive