MLGym by facebookresearch

Gym environment for ML research agents

Created 10 months ago

584 stars

Top 55.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Wing Lian

Founder of Axolotl AI

Project Summary

MLGym is an experimental framework and benchmark designed for advancing AI research agents, particularly LLM agents. It provides a diverse set of 13 open-ended AI research tasks across computer vision, NLP, RL, and game theory, requiring real-world AI research skills for problem-solving. The primary goal is to benchmark LLM agents and facilitate RL-based training in a research environment.

How It Works

MLGym operates by running AI research tasks within isolated containers (Docker or Podman), ensuring reproducible environments. It supports GPU acceleration for computationally intensive tasks. The framework orchestrates agent interactions with these tasks, allowing for the evaluation and training of agents using various ML algorithms, with a focus on reinforcement learning.

Quick Start & Requirements

Installation: Clone the repository, create a Python 3.11 conda environment, and install the package with pip install -e ..
Prerequisites: Requires Docker or Podman. For GPU support on Linux, nvidia-container-toolkit is necessary. macOS users need to set up Podman machine and export DOCKER_HOST. API keys for services like OpenAI and Anthropic can be configured via a .env file.
Running Tasks: Use python run.py with arguments specifying container type, task configuration, model, and resource limits. Example: python run.py --container_type docker --task_config_path tasks/battleOfSexes.yaml --model litellm:claude-3-5-sonnet-20240620 --gpus 0.
Documentation: Detailed documentation is under construction. A trajectory visualizer is available via streamlit run demo/trajectory_visualizer.py.

Highlighted Details

Benchmarks 13 diverse AI research tasks.
Supports LLM agent training and evaluation.
Offers a trajectory visualizer for inspecting agent behavior.
Designed for reproducible research environments using containers.

Maintenance & Community

Maintained by GenAI at Meta and UCSB NLP. Contribution guidelines and a maintenance plan are available.

Licensing & Compatibility

The majority of the code is licensed under CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International). SWE-Agent and Modded-NanoGPT are MIT licensed; Gymnax and Gymnax-blines are Apache 2.0 licensed. The non-commercial clause restricts use in proprietary or commercial applications.

Limitations & Caveats

MLGym is an experimental framework under heavy development, with potential for major design changes. The non-commercial license may limit adoption for commercial use cases.

Health Check

Last Commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days