MLGym  by facebookresearch

Gym environment for ML research agents

created 5 months ago
536 stars

Top 60.0% on sourcepulse

GitHubView on GitHub
Project Summary

MLGym is an experimental framework and benchmark designed for advancing AI research agents, particularly LLM agents. It provides a diverse set of 13 open-ended AI research tasks across computer vision, NLP, RL, and game theory, requiring real-world AI research skills for problem-solving. The primary goal is to benchmark LLM agents and facilitate RL-based training in a research environment.

How It Works

MLGym operates by running AI research tasks within isolated containers (Docker or Podman), ensuring reproducible environments. It supports GPU acceleration for computationally intensive tasks. The framework orchestrates agent interactions with these tasks, allowing for the evaluation and training of agents using various ML algorithms, with a focus on reinforcement learning.

Quick Start & Requirements

  • Installation: Clone the repository, create a Python 3.11 conda environment, and install the package with pip install -e ..
  • Prerequisites: Requires Docker or Podman. For GPU support on Linux, nvidia-container-toolkit is necessary. macOS users need to set up Podman machine and export DOCKER_HOST. API keys for services like OpenAI and Anthropic can be configured via a .env file.
  • Running Tasks: Use python run.py with arguments specifying container type, task configuration, model, and resource limits. Example: python run.py --container_type docker --task_config_path tasks/battleOfSexes.yaml --model litellm:claude-3-5-sonnet-20240620 --gpus 0.
  • Documentation: Detailed documentation is under construction. A trajectory visualizer is available via streamlit run demo/trajectory_visualizer.py.

Highlighted Details

  • Benchmarks 13 diverse AI research tasks.
  • Supports LLM agent training and evaluation.
  • Offers a trajectory visualizer for inspecting agent behavior.
  • Designed for reproducible research environments using containers.

Maintenance & Community

Maintained by GenAI at Meta and UCSB NLP. Contribution guidelines and a maintenance plan are available.

Licensing & Compatibility

The majority of the code is licensed under CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International). SWE-Agent and Modded-NanoGPT are MIT licensed; Gymnax and Gymnax-blines are Apache 2.0 licensed. The non-commercial clause restricts use in proprietary or commercial applications.

Limitations & Caveats

MLGym is an experimental framework under heavy development, with potential for major design changes. The non-commercial license may limit adoption for commercial use cases.

Health Check
Last commit

1 week ago

Responsiveness

1+ week

Pull Requests (30d)
4
Issues (30d)
2
Star History
57 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley).

SWE-Gym by SWE-Gym

0.6%
513
Environment for training software engineering agents
created 9 months ago
updated 4 days ago
Feedback? Help us improve.