Discover and explore top open-source AI tools and projects—updated daily.
Gen-VerseReinforcement learning for LLM agents
Top 92.7% on SourcePulse
This project introduces RLAnything and DemyAgent, frameworks addressing dynamic optimization and agentic reasoning for Large Language Models (LLMs). It provides researchers and developers with tools to achieve state-of-the-art performance in complex agentic scenarios, even with smaller model architectures, by enhancing reinforcement learning loops and agentic reasoning strategies.
How It Works
RLAnything employs a closed-loop optimization system that dynamically refines the policy, reward model, and environment. This joint optimization amplifies learning signals by integrating step-wise feedback from an optimized reward model and critic feedback for environment adaptation, outperforming outcome-only signals. DemyAgent focuses on agentic RL by leveraging high-diversity, real end-to-end trajectories, incorporating exploration-friendly techniques like reward clipping and entropy maintenance, and favoring deliberative reasoning with selective tool calls over verbose self-reasoning.
Quick Start & Requirements
git clone https://github.com/Gen-Verse/Open-AgentRL.git, followed by conda environment setup (Python 3.10 for RLAnything, 3.11 for DemyAgent) and pip install -e .[vllm] or pip install -r requirements_rlanything.txt.vllm, sglang, mcore, fsdp support. Requires specific datasets (SFT, RL, benchmarks like AIME2024/2025, GPQA-Diamond, LiveCodeBench-v6) and base LLM models (e.g., Qwen2.5-7B-Instruct, Qwen3-4B-Instruct-2507). Cloud service integration (Volcano Engine Cloud FaaS for code execution sandbox) is recommended for certain tasks.Highlighted Details
Maintenance & Community
No explicit information regarding community channels (Discord/Slack), active contributors, sponsorships, or roadmap is provided in the README.
Licensing & Compatibility
The README does not explicitly state the project's license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
Training and evaluation require substantial computational resources and specific cloud infrastructure setups (e.g., Volcengine Cloud). The project relies on external codebases (VeRL, ReTool) and specific model checkpoints, necessitating careful dependency management.
3 weeks ago
Inactive
open-thought
THUDM
microsoft