Discover and explore top open-source AI tools and projects—updated daily.
capgymCode-as-Policy agents for robot manipulation
New!
Top 76.0% on SourcePulse
Summary
CaP-X is an open-access framework for systematically studying, benchmarking, and improving Code-as-Policy (CaP) agents in robot manipulation. It provides tools to evaluate agent performance across diverse complexities and modalities, advancing AI agents' control capabilities and fostering reproducible research.
How It Works
CaP-Gym offers 39 interactive Gymnasium environments across Robosuite, LIBERO-PRO, BEHAVIOR for code-based robot control. CaP-Bench provides an 8-tier benchmark suite evaluating abstraction levels, interaction modes, and visual grounding. CaP-Agent0 is a training-free agentic framework with multi-turn visual differencing and auto-synthesized skill libraries. CaP-RL facilitates reinforcement learning via GRPO for coding agents, enabling minimal sim-to-real transfer.
Quick Start & Requirements
Installation uses uv for dependency management, requiring Python 3.10 and a CUDA-capable GPU. Clone repo with submodules, install uv, run uv sync. Simulator-specific installations are needed for Robosuite (uv sync --extra robosuite), LIBERO-PRO (separate Python 3.12 venv), and BEHAVIOR (Python 3.10, CUDA 12.x; cd capx/third_party/b1k && ./uv_install.sh --dataset). BEHAVIOR setup requires accepting NVIDIA Isaac Sim EULA and BEHAVIOR dataset licenses. SAM3 perception server weights need HuggingFace authentication. An LLM proxy (e.g., OpenRouter) is mandatory. Documentation for LIBERO-PRO, BEHAVIOR tasks, and configuration is available within the repository. The project paper is available at arXiv:2603.22435.
Highlighted Details
Maintenance & Community
Project involves researchers from NVIDIA, UC Berkeley, Stanford, and CMU. No community channels (Discord, Slack), roadmaps, or social media links are provided in the README.
Licensing & Compatibility
Released under the MIT License, permitting commercial use and integration into closed-source applications without significant restrictions.
Limitations & Caveats
Simulator dependency conflicts (Robosuite 1.5.0 vs. LIBERO's 1.4.0) necessitate separate virtual environments. BEHAVIOR simulator requires specific environment variable configurations (e.g., OMNIGIBSON_GPU_ID) for multi-GPU and headless operation. SAM3 weights require prior HuggingFace authentication.
2 days ago
Inactive
allenai
openai
Unity-Technologies
microsoft