Discover and explore top open-source AI tools and projects—updated daily.
mli0603Multimodal VLA training and evaluation framework
Top 99.8% on SourcePulse
Team Comet's OpenPi Comet repository provides a unified framework for training Vision-Language-Action (VLA) models, specifically for the 2025 BEHAVIOR Challenge. It addresses the complexities of pre-training, post-training, data generation, and evaluation on the BEHAVIOR-1K dataset. This codebase enables researchers and engineers to leverage a competitive end-to-end VLA training strategy, demonstrated by their 2nd place finish in the challenge and subsequent performance improvements.
How It Works
The framework employs a distributed training infrastructure supporting multi-dataset sharding. It offers diverse pre-training setups, incorporating hierarchical instructions (global, subtask, skill) and multimodal observations (RGB, depth, point cloud, etc.). Post-training is achieved via Rejection Sampling Fine-Tuning (RFT), which includes automated dataset construction and simulation rollouts. The approach prioritizes native OpenPi compatibility for seamless integration.
Quick Start & Requirements
openpi-comet, BEHAVIOR-1K), install uv, sync dependencies (uv sync), install package (uv pip install -e .), and activate environment (source .venv/bin/activate). Install BEHAVIOR dependencies separately.https://github.com/mli0603/openpi-comet.githttps://github.com/StanfordVL/BEHAVIOR-1K.gitcomet-1.5k) released Dec 2025/Jan 2026.Highlighted Details
Maintenance & Community
Developed by Team Comet, the project encourages community feedback and contributions via GitHub issues and discussions. Key contributors are listed in the citation. No explicit roadmap or dedicated community channels (like Discord/Slack) are mentioned.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README, which may pose a risk for commercial use or integration into closed-source projects. The codebase has been tested on Ubuntu 22.04 and is designed for native compatibility with OpenPi.
Limitations & Caveats
The current training script does not support multi-node distributed training. Specific NVIDIA GPU memory requirements are detailed for different operational modes. The lack of a clearly stated license is a significant caveat for adoption.
4 months ago
Inactive
hiyouga