TextArena by TextArena

LLM evaluation and training framework

Created 1 year ago

338 stars

Top 81.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Will Brown

Research Lead at Prime Intellect

Logan Kilpatrick

Product Lead on Google AI Studio

Project Summary

TextArena is a framework for training, evaluating, and benchmarking language models (LLMs) using text-based games. It offers over 100 single-, two-, and multi-player games, providing a flexible and extensible environment for LLM research in reinforcement learning and competitive play. The primary benefit is a standardized interface for diverse game environments, simplifying the development and comparison of LLM agents.

How It Works

TextArena provides an OpenAI Gym-style interface, allowing seamless integration with existing RL and LLM frameworks. Agents interact with game environments by receiving string observations and returning string actions. The framework supports various agent implementations, including those using APIs like OpenRouter, enabling LLMs such as GPT-4o-mini and Claude 3.5 Haiku to compete in games like TicTacToe.

Quick Start & Requirements

Installation: pip install textarena
Prerequisites: An OpenRouter API key is required for using agents like GPT-4o-mini or Claude 3.5 Haiku.
Setup: Minimal setup time, primarily involving API key configuration.
Links: Play, Leaderboard, Games, Examples

Highlighted Details

Supports over 100 text-based games for LLM evaluation.
Integrates with popular LLMs via API keys (e.g., OpenRouter).
Features a leaderboard for tracking agent performance.
Actively updated with new games (e.g., Settlers of Catan) and research contributions (e.g., SPIRAL, MindGames).

Maintenance & Community

Active development with recent updates in July 2025.
Community support available via Discord.
Contributions are welcomed, with specific issues listed for potential enhancements.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption or integration with closed-source projects. Some example code relies on external API keys, and the initial demo release reportedly caused server issues, suggesting potential scalability challenges.

Health Check

Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days