Discover and explore top open-source AI tools and projects—updated daily.
LatentActionPretrainingVLA pretraining via unsupervised latent action learning from video
Top 72.6% on SourcePulse
LAPA (Latent Action Pretraining) is an unsupervised approach for pretraining Vision-Language-Action (VLA) models, targeting researchers and engineers in robotics and embodied AI. It enables the creation of state-of-the-art VLA models with significantly improved pretraining efficiency, outperforming models trained with ground-truth action labels.
How It Works
LAPA leverages latent action quantization to pretrain VLA models without requiring explicit robot action labels. It quantizes actions into a discrete latent space, allowing for unsupervised learning from video data. This approach achieves over 30x greater pretraining efficiency compared to conventional methods.
Quick Start & Requirements
pip install -r requirements.txt within a conda environment.conda.tokenizer.model, vqgan, params) from Huggingface.python -m latent_pretraining.inference after setting up checkpoints.scripts/finetune_real.sh, scripts/finetune_simpler.sh). Fine-tuning experiments were conducted with 4x 80GB A100 GPUs.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The output of the inference script is in a latent action space ($8^4$ dimensions), not the real action space, requiring fine-tuning to map to physical robot actions. Custom dataset training for latent action quantization requires modifying data loading code to match the Something-Something v2 dataset structure.
9 months ago
1+ week
allenzren
microsoft
hiyouga