TextArena  by LeonGuertler

LLM evaluation and training framework

Created 1 year ago
276 stars

Top 93.9% on SourcePulse

GitHubView on GitHub
Project Summary

TextArena is a framework for training, evaluating, and benchmarking language models (LLMs) using text-based games. It offers over 100 single-, two-, and multi-player games, providing a flexible and extensible environment for LLM research in reinforcement learning and competitive play. The primary benefit is a standardized interface for diverse game environments, simplifying the development and comparison of LLM agents.

How It Works

TextArena provides an OpenAI Gym-style interface, allowing seamless integration with existing RL and LLM frameworks. Agents interact with game environments by receiving string observations and returning string actions. The framework supports various agent implementations, including those using APIs like OpenRouter, enabling LLMs such as GPT-4o-mini and Claude 3.5 Haiku to compete in games like TicTacToe.

Quick Start & Requirements

  • Installation: pip install textarena
  • Prerequisites: An OpenRouter API key is required for using agents like GPT-4o-mini or Claude 3.5 Haiku.
  • Setup: Minimal setup time, primarily involving API key configuration.
  • Links: Play, Leaderboard, Games, Examples

Highlighted Details

  • Supports over 100 text-based games for LLM evaluation.
  • Integrates with popular LLMs via API keys (e.g., OpenRouter).
  • Features a leaderboard for tracking agent performance.
  • Actively updated with new games (e.g., Settlers of Catan) and research contributions (e.g., SPIRAL, MindGames).

Maintenance & Community

  • Active development with recent updates in July 2025.
  • Community support available via Discord.
  • Contributions are welcomed, with specific issues listed for potential enhancements.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption or integration with closed-source projects. Some example code relies on external API keys, and the initial demo release reportedly caused server issues, suggesting potential scalability challenges.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
15
Issues (30d)
2
Star History
32 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.