Discover and explore top open-source AI tools and projects—updated daily.
EGalahadPerformant stack for Vision-Language-Action model training and serving
Top 89.7% on SourcePulse
Summary
VLA-Scratch offers a modular, performant, and efficient stack for training, evaluating, and serving Vision-Language-Action (VLA) models. It targets engineers and researchers, aiming to make VLA model development fast and approachable by minimizing dependencies and optimizing performance.
How It Works
The stack uses TensorClass for explicit data boundaries, ensuring a typed, modular codebase that facilitates heterogeneous dataset co-training and clear data flow. Performance is enhanced by optimizing VLM forward passes to eliminate host-device syncs, leveraging native PyTorch operations like FSDP2 and gradient checkpointing for dedicated tuning. Experimentation is streamlined via a Hydra workflow, allowing seamless registration and overriding of configurations with shared grammar across training, evaluation, and serving scripts.
Quick Start & Requirements
Environment setup uses uv: GIT_LFS_SKIP_SMUDGE=1 uv sync. Commands are provided for training (uv run torchrun ... scripts/train_policy.py), evaluation (uv run scripts/eval_policy.py), and serving (uv run scripts/serve_policy.py). Dependencies include torchrun, wandb, and pyav. A note addresses potential RTX 5090/CUDA 12.8 compatibility issues with stable PyTorch, recommending PyTorch-Nightly. Further details are in scripts/README.md and examples/libero.
Highlighted Details
TensorClass data model enables composable modules and heterogeneous dataset co-training.Maintenance & Community
No specific details regarding contributors, community channels, or roadmaps are present in the provided README.
Licensing & Compatibility
The README does not specify a software license, potentially impacting commercial use or integration into closed-source projects.
Limitations & Caveats
Users with RTX 5090 GPUs and CUDA 12.8 may face issues with stable PyTorch, requiring PyTorch-Nightly. The troubleshooting section is marked "To be Continued." The absence of a specified license is a significant adoption caveat.
16 hours ago
Inactive
zengyan-97
microsoft
hiyouga
TRI-ML