FluxVLA  by FluxVLA

End-to-end platform for embodied AI and VLA engineering

Created 3 weeks ago

New!

302 stars

Top 88.3% on SourcePulse

GitHubView on GitHub
Project Summary

FluxVLA is a full-stack, end-to-end engineering platform for embodied intelligence and Vision-Language Agents (VLAs). It targets researchers and engineers, standardizing VLA development and deployment from data to real-robot applications, significantly reducing engineering complexity.

How It Works

Built on unified configuration, standardized interfaces, and module decoupling, FluxVLA enables a complete engineering loop. It supports diverse VLA models (Gr00t, Pi0.5, OpenVLA), LLM backbones (Llama, Gemma, Qwen), and vision backbones (DINOv2, SigLIP). Training strategies include FSDP, DDP, and LoRA, with support for Parquet datasets and safetensors weights. Features include multi-GPU evaluation, Real-Time Chunking (RTC) for trajectory continuity, and accelerated inference via Triton kernels and CUDA Graph capture.

Quick Start & Requirements

Installation requires a Python 3.10 conda environment, CUDA-enabled PyTorch (e.g., torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124), flash-attn, av, and other dependencies via pip install -r requirements.txt. Key prerequisites include CUDA >= 12.4 and specific system libraries for EGL rendering. Experiment tracking via wandb/TensorBoard is supported. Links to PyTorch installation and EGL configuration are provided.

Highlighted Details

  • Achieves high LIBERO benchmark performance (e.g., FluxVLA(Pi) at 98.4%).
  • Single configuration file manages the entire workflow.
  • Broad support for VLA models, LLM, and vision backbones.
  • Advanced inference acceleration (Triton fused kernels, CUDA Graph).
  • Real-Time Chunking (RTC) for improved trajectory continuity.

Maintenance & Community

The project acknowledges contributions from NVIDIA Isaac, OpenVLA, and Qwen. Support is available via GitHub issues. The roadmap includes expanding backbone support, integrating VLM/CoT data training, and adding support for tools like Isaac Sim.

Licensing & Compatibility

The license type is not explicitly stated in the provided README, requiring further investigation for commercial use.

Limitations & Caveats

GR00T evaluation on LIBERO is unstable and sensitive to environmental factors. RTX 5090 requires updated Triton (3.2.0+). Installation can be complex, with potential issues related to CMake, NumPy version conflicts, and Hugging Face connectivity, often needing specific environment variable settings.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
16
Issues (30d)
3
Star History
304 stars in the last 27 days

Explore Similar Projects

Feedback? Help us improve.