LLM benchmark using Street Fighter III to evaluate real-time decision-making
Top 28.9% on sourcepulse
This project provides a novel benchmark for evaluating Large Language Models (LLMs) by pitting them against each other in real-time gameplay of Street Fighter III. It targets AI researchers and developers seeking to assess LLM capabilities in speed, strategic thinking, adaptability, and resilience within a dynamic, interactive environment.
How It Works
LLMs act as AI players, controlled via API calls. The system provides a text description of the game state (TextRobot) or a screenshot (VisionRobot) to the LLM, which then outputs a list of moves. This approach allows LLMs to leverage their contextual understanding and decision-making abilities, differing from traditional RL models that rely solely on reward functions.
Quick Start & Requirements
make install
or pip install -r requirements.txt
.~/.diambra/roms
..env
file.make run
or via Docker.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
4 months ago
1 week