Discover and explore top open-source AI tools and projects—updated daily.
Gen-VersePersonalize AI agents through conversational reinforcement learning
New!
Top 30.4% on SourcePulse
OpenClaw-RL empowers self-hosted AI agents with personalization capabilities through continuous reinforcement learning from natural conversation feedback. It targets engineers and researchers seeking to enhance LLM agents without interrupting live usage, offering a privacy-preserving, asynchronous framework that transforms dialogue into actionable training signals for improved agent performance over time.
How It Works
The framework employs a fully asynchronous, 4-component architecture (serving, rollout collection, PRM judging, policy training) to decouple processes, allowing continuous background optimization without blocking user interactions. It automatically converts multi-turn conversations into training signals by classifying turns and using subsequent messages as state feedback. Two distinct learning paradigms are supported: Binary RL (GRPO) leverages a Process Reward Model (PRM) for scalar rewards, while On-Policy Distillation (OPD) uses hindsight-derived textual hints to guide policy updates via an "enhanced teacher" model, offering richer directional learning.
Quick Start & Requirements
Setup requires a robust environment: 8x GPUs (configurable via environment variables like NUM_GPUS, ACTOR_GPUS, ROLLOUT_GPUS, PRM_GPUS) running CUDA 12.9 and Python 3.12, with the Slime RL framework as a prerequisite. Core execution involves navigating to the slime directory and running specific bash scripts like ../openclaw-rl/run_qwen3_4b_openclaw_rl.sh for Binary RL or ../openclaw-opd/run_qwen3_4b_openclaw_opd.sh for OPD. The system exposes an OpenAI-compatible API endpoint at http://<HOST_IP>:30000/v1 for integration with OpenClaw. Detailed environment setup instructions are available in ./instructions/README.md.
Highlighted Details
Maintenance & Community
The project roadmap indicates ongoing development with planned enhancements for broader model support and scalable infrastructure. No specific community channels (e.g., Discord, Slack) or notable contributors are detailed in the provided README.
Licensing & Compatibility
License information is not specified in the provided README. This absence requires further investigation for commercial use or integration into closed-source projects.
Limitations & Caveats
The system has significant hardware demands, defaulting to 8 GPUs, and requires specific software versions (CUDA 12.9, Python 3.12). Its reliance on the Slime framework and the lack of explicit licensing information present potential adoption blockers. The project appears to be in active development, as indicated by its roadmap.
10 hours ago
Inactive
KhoomeiK
Forethought-Technologies
huggingface
aiwaves-cn
RasaHQ