OpenTinker  by open-tinker

RL-as-a-Service infrastructure for foundation models

Created 3 weeks ago

New!

544 stars

Top 58.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

OpenTinker provides an RL-as-a-Service infrastructure designed to democratize agentic reinforcement learning for foundation models. It offers a streamlined platform for researchers and developers to implement, train, and deploy RL agents, simplifying complex setups and accelerating development.

How It Works

The core innovation lies in its flexible environment design framework, which categorizes scenarios across two dimensions: Data Source (Data-Dependent vs. Data-Free) and Interaction Mode (Single-Turn vs. Multi-Turn). This 2x2 paradigm enables four distinct training approaches, catering to diverse learning objectives from simple QA tasks to complex game playing agents.

Quick Start & Requirements

Installation involves cloning the repository (git clone --recurse-submodules), followed by installing the core package (pip install -e .) and the verl component (cd verl; pip install -e .). Server setup is recommended via Docker, requiring GPU access (docker run ... --gpus all). Manual server installation is possible but may lead to version conflicts. Authentication is configurable via opentinker/scheduler/config/scheduler.yaml. Links to examples, Project Page, DeepWiki, and Slack are available.

Highlighted Details

The project supports various agentic RL tasks, including LLM and VLM applications for mathematical problem-solving (single and multi-turn, with LoRA options), and multi-turn agents for games like Gomoku and AlfWorld. Performance tracking is integrated via Weights & Biases (wandb).

Maintenance & Community

Community support is available via a Slack channel. Specific details on core contributors, active development, or a public roadmap are not detailed in the provided README.

Licensing & Compatibility

The provided README does not specify a software license. This absence requires clarification for assessing commercial use or closed-source integration compatibility.

Limitations & Caveats

The client currently has a transitional dependency on a subset of verl functions, planned for future decoupling to ensure a lightweight client. Manual server dependency installation carries a risk of version conflicts, making the Docker approach preferable for stability.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
12
Star History
550 stars in the last 21 days

Explore Similar Projects

Starred by Will Brown Will Brown(Research Lead at Prime Intellect) and Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research).

hud-python by hud-evals

3.3%
257
AI agent development and evaluation toolkit
Created 10 months ago
Updated 20 hours ago
Feedback? Help us improve.