cap-x  by capgym

Code-as-Policy agents for robot manipulation

Created 2 weeks ago

New!

373 stars

Top 76.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

CaP-X is an open-access framework for systematically studying, benchmarking, and improving Code-as-Policy (CaP) agents in robot manipulation. It provides tools to evaluate agent performance across diverse complexities and modalities, advancing AI agents' control capabilities and fostering reproducible research.

How It Works

CaP-Gym offers 39 interactive Gymnasium environments across Robosuite, LIBERO-PRO, BEHAVIOR for code-based robot control. CaP-Bench provides an 8-tier benchmark suite evaluating abstraction levels, interaction modes, and visual grounding. CaP-Agent0 is a training-free agentic framework with multi-turn visual differencing and auto-synthesized skill libraries. CaP-RL facilitates reinforcement learning via GRPO for coding agents, enabling minimal sim-to-real transfer.

Quick Start & Requirements

Installation uses uv for dependency management, requiring Python 3.10 and a CUDA-capable GPU. Clone repo with submodules, install uv, run uv sync. Simulator-specific installations are needed for Robosuite (uv sync --extra robosuite), LIBERO-PRO (separate Python 3.12 venv), and BEHAVIOR (Python 3.10, CUDA 12.x; cd capx/third_party/b1k && ./uv_install.sh --dataset). BEHAVIOR setup requires accepting NVIDIA Isaac Sim EULA and BEHAVIOR dataset licenses. SAM3 perception server weights need HuggingFace authentication. An LLM proxy (e.g., OpenRouter) is mandatory. Documentation for LIBERO-PRO, BEHAVIOR tasks, and configuration is available within the repository. The project paper is available at arXiv:2603.22435.

Highlighted Details

  • Supports 39 robot manipulation tasks across Robosuite, LIBERO-PRO, and BEHAVIOR.
  • Features an 8-tier benchmark suite (S1-S4 single-turn, M1-M4 multi-turn).
  • CaP-Agent0 offers training-free capabilities with multi-turn visual differencing and auto-synthesized skill libraries.
  • CaP-RL enables effective sim-to-real transfer for coding agents using GRPO.

Maintenance & Community

Project involves researchers from NVIDIA, UC Berkeley, Stanford, and CMU. No community channels (Discord, Slack), roadmaps, or social media links are provided in the README.

Licensing & Compatibility

Released under the MIT License, permitting commercial use and integration into closed-source applications without significant restrictions.

Limitations & Caveats

Simulator dependency conflicts (Robosuite 1.5.0 vs. LIBERO's 1.4.0) necessitate separate virtual environments. BEHAVIOR simulator requires specific environment variable configurations (e.g., OMNIGIBSON_GPU_ID) for multi-GPU and headless operation. SAM3 weights require prior HuggingFace authentication.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
8
Issues (30d)
3
Star History
374 stars in the last 18 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI) and Jianwei Yang Jianwei Yang(Research Scientist at Meta Superintelligence Lab).

allenact by allenai

0%
380
Open-source framework for embodied AI research
Created 6 years ago
Updated 7 months ago
Feedback? Help us improve.