Self-play RL for multiplayer environments
Top 85.2% on sourcepulse
This project enables training AI agents for custom multiplayer environments using self-play reinforcement learning, specifically Proximal Policy Optimization (PPO). It's designed for researchers and developers interested in multi-agent AI and game development, offering a structured approach to evolving AI opponents.
How It Works
The core innovation is a wrapper that transforms multiplayer environments into single-player ones for PPO. It manages opponent turn-taking and delays reward signals until all players have acted. This creates a constantly evolving training landscape where new policy versions are integrated into a network bank, allowing agents to learn against increasingly sophisticated versions of themselves.
Quick Start & Requirements
docker-compose up -d
), and install a specific environment (e.g., bash ./scripts/install_env.sh sushigo
).mpirun -np 10 python3 train.py -e sushigo
).Highlighted Details
step
, reset
, render
, observation
, legal_actions
).test.py
for playing against trained agents or baselines and train.py
for self-play training.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The documentation for test.py
and train.py
command-line arguments is noted as incomplete, with further documentation planned.
1 year ago
1+ week