cap-x by capgym

Code-as-Policy agents for robot manipulation

Created 3 months ago

616 stars

Top 52.7% on SourcePulse

Project Summary

Summary

CaP-X is an open-access framework for systematically studying, benchmarking, and improving Code-as-Policy (CaP) agents in robot manipulation. It provides tools to evaluate agent performance across diverse complexities and modalities, advancing AI agents' control capabilities and fostering reproducible research.

How It Works

CaP-Gym offers 39 interactive Gymnasium environments across Robosuite, LIBERO-PRO, BEHAVIOR for code-based robot control. CaP-Bench provides an 8-tier benchmark suite evaluating abstraction levels, interaction modes, and visual grounding. CaP-Agent0 is a training-free agentic framework with multi-turn visual differencing and auto-synthesized skill libraries. CaP-RL facilitates reinforcement learning via GRPO for coding agents, enabling minimal sim-to-real transfer.

Quick Start & Requirements

Installation uses uv for dependency management, requiring Python 3.10 and a CUDA-capable GPU. Clone repo with submodules, install uv, run uv sync. Simulator-specific installations are needed for Robosuite (uv sync --extra robosuite), LIBERO-PRO (separate Python 3.12 venv), and BEHAVIOR (Python 3.10, CUDA 12.x; cd capx/third_party/b1k && ./uv_install.sh --dataset). BEHAVIOR setup requires accepting NVIDIA Isaac Sim EULA and BEHAVIOR dataset licenses. SAM3 perception server weights need HuggingFace authentication. An LLM proxy (e.g., OpenRouter) is mandatory. Documentation for LIBERO-PRO, BEHAVIOR tasks, and configuration is available within the repository. The project paper is available at arXiv:2603.22435.

Highlighted Details

Supports 39 robot manipulation tasks across Robosuite, LIBERO-PRO, and BEHAVIOR.
Features an 8-tier benchmark suite (S1-S4 single-turn, M1-M4 multi-turn).
CaP-Agent0 offers training-free capabilities with multi-turn visual differencing and auto-synthesized skill libraries.
CaP-RL enables effective sim-to-real transfer for coding agents using GRPO.

Maintenance & Community

Project involves researchers from NVIDIA, UC Berkeley, Stanford, and CMU. No community channels (Discord, Slack), roadmaps, or social media links are provided in the README.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

Simulator dependency conflicts (Robosuite 1.5.0 vs. LIBERO's 1.4.0) necessitate separate virtual environments. BEHAVIOR simulator requires specific environment variable configurations (e.g., OMNIGIBSON_GPU_ID) for multi-GPU and headless operation. SAM3 weights require prior HuggingFace authentication.

cap-x by capgym

Explore Similar Projects

VisualAgentBench by THUDM

GUI-R1 by ritzz-ai

SEAgent by SunzeY

Awesome-Papers-Autonomous-Agent by lafmdp

Learn-Open-Harness by joyehuang

langchain-skills by langchain-ai

robosumo by openai

obstacle-tower-env by Unity-Technologies

Steve by YuvDwi

oreilly-ai-agents by sinanuozdemir

awesome-harness-engineering by ai-boost

agent-lightning by microsoft