Discover and explore top open-source AI tools and projects—updated daily.
TRI-MLUnified framework for training Vision-Language-Action models
Top 84.6% on SourcePulse
VLA Foundry is a unified framework designed for training Vision-Language-Action (VLA) models, enabling seamless progression from Large Language Models (LLMs) to Vision-Language Models (VLMs) and finally to VLAs within a single environment. It targets researchers and engineers working with multi-modal AI, offering a flexible and efficient platform to streamline complex training pipelines without external dependencies. The framework's modular design and support for multi-node training accelerate development and deployment of advanced AI agents.
How It Works
The framework employs a modular, pure PyTorch architecture, facilitating easy modification and extension. It supports training across multiple modalities, including text, image-captions, and robotics data. VLA Foundry integrates with Hugging Face, allowing users to leverage pre-trained weights for LLMs, VLMs, and CLIP models. For distributed training, it utilizes FSDP2 and streams datasets via WebDatasets, supporting multi-node setups on clusters like AWS SageMaker and local multi-GPU configurations with torchrun. Dataset mixing is supported, allowing specification of sources and ratios during dataloading for balanced batching.
Quick Start & Requirements
Installation is recommended using uv for environment management. After installing uv, create a Python 3.12 virtual environment and install dependencies with:
uv sync
uv pip install -e .
The recommended workflow is uv run <script> <args>. A quickstart example command for training a VLM is provided, requiring AWS credentials for S3 data access and Hugging Face tokens.
Highlighted Details
draccus with nested parameters and YAML preset includes.Maintenance & Community
The repository includes contribution guidelines (CONTRIBUTING.md) and a troubleshooting FAQ (FAQ.md). Specific community channels (e.g., Discord, Slack) or a public roadmap are not explicitly mentioned in the README.
Licensing & Compatibility
The README does not specify a software license. This absence is a critical factor for adoption, as it leaves the terms of use, distribution, and modification undefined, potentially restricting commercial or closed-source integration.
Limitations & Caveats
A known limitation exists with YAML configuration includes, where overriding nested parameters within an included file is not straightforward and may require redefining all parameters or using command-line arguments. Additionally, tests requiring AWS S3 access are not pre-configured, recommending the use of small, local datasets for testing purposes.
2 days ago
Inactive
stanford-crfm
mlfoundations
hiyouga
huggingface