FluxVLA by FluxVLA

End-to-end platform for embodied AI and VLA engineering

Created 3 weeks ago

New!

302 stars

Top 88.3% on SourcePulse

Project Summary

FluxVLA is a full-stack, end-to-end engineering platform for embodied intelligence and Vision-Language Agents (VLAs). It targets researchers and engineers, standardizing VLA development and deployment from data to real-robot applications, significantly reducing engineering complexity.

How It Works

Built on unified configuration, standardized interfaces, and module decoupling, FluxVLA enables a complete engineering loop. It supports diverse VLA models (Gr00t, Pi0.5, OpenVLA), LLM backbones (Llama, Gemma, Qwen), and vision backbones (DINOv2, SigLIP). Training strategies include FSDP, DDP, and LoRA, with support for Parquet datasets and safetensors weights. Features include multi-GPU evaluation, Real-Time Chunking (RTC) for trajectory continuity, and accelerated inference via Triton kernels and CUDA Graph capture.

Quick Start & Requirements

Installation requires a Python 3.10 conda environment, CUDA-enabled PyTorch (e.g., torch==2.6.0 --index-url https://download.pytorch.org/whl/cu124), flash-attn, av, and other dependencies via pip install -r requirements.txt. Key prerequisites include CUDA >= 12.4 and specific system libraries for EGL rendering. Experiment tracking via wandb/TensorBoard is supported. Links to PyTorch installation and EGL configuration are provided.

Highlighted Details

Achieves high LIBERO benchmark performance (e.g., FluxVLA(Pi) at 98.4%).
Single configuration file manages the entire workflow.
Broad support for VLA models, LLM, and vision backbones.
Advanced inference acceleration (Triton fused kernels, CUDA Graph).
Real-Time Chunking (RTC) for improved trajectory continuity.

Maintenance & Community

The project acknowledges contributions from NVIDIA Isaac, OpenVLA, and Qwen. Support is available via GitHub issues. The roadmap includes expanding backbone support, integrating VLM/CoT data training, and adding support for tools like Isaac Sim.

Licensing & Compatibility

The license type is not explicitly stated in the provided README, requiring further investigation for commercial use.

Limitations & Caveats

GR00T evaluation on LIBERO is unstable and sensitive to environmental factors. RTX 5090 requires updated Triton (3.2.0+). Installation can be complex, with potential issues related to CMake, NumPy version conflicts, and Hugging Face connectivity, often needing specific environment variable settings.

FluxVLA by FluxVLA

Explore Similar Projects

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla_foundry by TRI-ML

vla-evaluation-harness by allenai

RoboVLMs by Robot-VLAs

embodied-agents by mbodiai

rdk_model_zoo by D-Robotics

OpenDriveVLA by DriveVLA

dexbotic by dexmal

MaixPy by sipeed

Qwen3.6 by QwenLM

X-AnyLabeling by CVHub520

openpi by Physical-Intelligence