Gym environment for ML research agents
Top 60.0% on sourcepulse
MLGym is an experimental framework and benchmark designed for advancing AI research agents, particularly LLM agents. It provides a diverse set of 13 open-ended AI research tasks across computer vision, NLP, RL, and game theory, requiring real-world AI research skills for problem-solving. The primary goal is to benchmark LLM agents and facilitate RL-based training in a research environment.
How It Works
MLGym operates by running AI research tasks within isolated containers (Docker or Podman), ensuring reproducible environments. It supports GPU acceleration for computationally intensive tasks. The framework orchestrates agent interactions with these tasks, allowing for the evaluation and training of agents using various ML algorithms, with a focus on reinforcement learning.
Quick Start & Requirements
pip install -e .
.nvidia-container-toolkit
is necessary. macOS users need to set up Podman machine and export DOCKER_HOST
. API keys for services like OpenAI and Anthropic can be configured via a .env
file.python run.py
with arguments specifying container type, task configuration, model, and resource limits. Example: python run.py --container_type docker --task_config_path tasks/battleOfSexes.yaml --model litellm:claude-3-5-sonnet-20240620 --gpus 0
.streamlit run demo/trajectory_visualizer.py
.Highlighted Details
Maintenance & Community
Maintained by GenAI at Meta and UCSB NLP. Contribution guidelines and a maintenance plan are available.
Licensing & Compatibility
The majority of the code is licensed under CC-BY-NC 4.0 (Attribution-NonCommercial 4.0 International). SWE-Agent and Modded-NanoGPT are MIT licensed; Gymnax and Gymnax-blines are Apache 2.0 licensed. The non-commercial clause restricts use in proprietary or commercial applications.
Limitations & Caveats
MLGym is an experimental framework under heavy development, with potential for major design changes. The non-commercial license may limit adoption for commercial use cases.
1 week ago
1+ week