realtime-vla  by dexmal

Real-time Visual-Language Agent inference

Created 5 months ago
490 stars

Top 62.9% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides accelerated inference kernels for the Pi0 VLA model, enabling real-time performance for applications requiring fast visual understanding and action. It targets researchers and developers in robotics and embodied AI, offering significant latency reductions for complex tasks, demonstrated by a real-world falling pen catch with sub-200ms latency.

How It Works

The architecture decomposes VLA computation into a vision encoder, LLM, and action expert, simplifying the entire pipeline to 24 GEMM-like operations. This modular structure is optimized using custom Triton kernels for efficient GPU execution, achieving high inference frequencies.

Quick Start & Requirements

  • Usage: Copy pi0_infer.py into your project. Use convert_from_jax.py to load checkpoints. Example Python API: infer.forward(normalized_observation_image_bfloat16, observation_state_bfloat16, diffusion_input_noise_bfloat16).
  • Prerequisites: RTX 4090 (tuned for), CUDA 12.6 (tuned for), torch, triton.
  • Links: Citation points to arXiv preprint: arXiv:2510.26742.

Highlighted Details

  • Achieves 30Hz VLA inference and 480Hz trajectory frequency.
  • Demonstrated sub-200ms end-to-end latency for tasks like catching a falling pen.
  • Inference times on RTX 4090: ~20ms (1 view, no prompt), ~39.2ms (3 views, 20-token prompt).
  • Recommended inference rates: 30fps for 1-2 views, 25fps for 3 views to match camera speeds.

Maintenance & Community

No specific details on maintenance, community channels, or contributors are provided in the README snippet.

Licensing & Compatibility

The license type is not explicitly mentioned in the provided README snippet.

Limitations & Caveats

The inference kernels are specifically tuned for RTX 4090 and CUDA 12.6, though they are expected to function on similar platforms supporting torch and triton.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
47 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.