torchforge  by meta-pytorch

Agentic RL library for scalable PyTorch experimentation

Created 10 months ago
674 stars

Top 49.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Torchforge is a PyTorch-native library designed to simplify Reinforcement Learning (RL) experimentation by abstracting infrastructure concerns, allowing researchers to focus on algorithms. It offers a scalable implementation of RL abstractions, catering to both rapid research and power-user hackability. However, development has been paused, with efforts consolidated into torchtitan.

How It Works

The library provides clear RL abstractions and a single, scalable implementation. It enables fine-grained control over distributed training aspects like placement, fault handling, and communication patterns, while also allowing users to ignore infrastructure when desired. This approach supports shifting between asynchronous and synchronous training modes across thousands of GPUs.

Quick Start & Requirements

  • Prerequisites: PyTorch 2.9.0 with Monarch, vLLM, and torchtitan; Python 3.12.
  • Installation:
    • Conda: conda create -n forge python=3.12 && conda activate forge && ./scripts/install.sh
    • ROCm: conda create -n forge python=3.12 && conda activate forge && ./scripts/install_rocm.sh (requires manual setting of PYTORCH_ROCM_ARCH/ROCM_VERSION for ROCm 7.x, defaults to nightly wheels; RDMA/distributed tensor features disabled).
    • Pixi: curl -fsSL https://pixi.sh/install.sh | bash then pixi run install (Conda recommended; uv support incomplete).
  • Example Run: python -m apps.grpo.main --config apps/grpo/qwen3_1_7b.yaml (requires minimum 2 GPUs).
  • Docs: https://meta-pytorch.org/torchforge
  • Tutorials: Coming soon.

Highlighted Details

  • PyTorch-native agentic RL library.
  • Isolates RL loop from infrastructure for rapid research.
  • Enables modification of RL loop components for power users.
  • Scalable training across thousands of GPUs with flexible sync/async modes.

Maintenance & Community

Development in Torchforge has been paused and consolidated into torchtitan. No specific community links (Discord, Slack, etc.) or roadmap details are provided in the README.

Licensing & Compatibility

Source code is licensed under BSD 3-clause. Users must be aware of potential legal obligations related to third-party data and models linked within the repository.

Limitations & Caveats

Development is paused, with the project's future consolidated into torchtitan. ROCm builds disable RDMA and distributed tensor features due to USE_TENSOR_ENGINE=0. Pure uv installation via Pixi is not yet fully functional. Tutorials are still under development.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

0.5%
3k
RL library for large language models
Created 11 months ago
Updated 12 hours ago
Starred by Evan Hubinger Evan Hubinger(Head of Alignment Stress-Testing at Anthropic), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
1 more.

rl by pytorch

0.2%
3k
PyTorch library for reinforcement learning research
Created 4 years ago
Updated 5 hours ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
6 more.

tianshou by thu-ml

0.3%
11k
PyTorch RL library for algorithm development and application
Created 8 years ago
Updated 3 weeks ago
Feedback? Help us improve.