simlingo  by RenzKa

Vision-only autonomous driving with language-action alignment

Created 10 months ago
316 stars

Top 85.6% on SourcePulse

GitHubView on GitHub
Project Summary

SimLingo addresses vision-only closed-loop autonomous driving by integrating language understanding and action generation. It targets researchers and engineers, offering state-of-the-art driving performance alongside multimodal AI capabilities, including VQA and instruction following.

How It Works

This project implements a Vision-Language-Action (VLA) model within the CARLA simulator, building upon the CARLA Garage framework. It leverages the PDM-lite expert for data collection and introduces "Action Dreaming" for enhanced language-action alignment. The approach enables a vision-only system to perform complex driving tasks and respond to linguistic queries or instructions.

Quick Start & Requirements

  • Installation: Clone the repository, set up CARLA 0.9.15 (setup_carla.sh), create a Conda environment (environment.yaml), and install PyTorch (2.2.0) and Flash-attn (2.7.0.post2).
  • Prerequisites: CARLA 0.9.15, specific PyTorch/Flash-attn versions, CUDA (implied), Git LFS for dataset download, and an OpenAI API key for language evaluation.
  • Configuration: Requires setting environment variables like CARLA_ROOT, WORK_DIR, and PYTHONPATH.
  • Dataset: Available at https://huggingface.co/datasets/RenzKa/simlingo.

Highlighted Details

  • Accepted as a highlight paper at CVPR 2025.
  • Achieves state-of-the-art driving performance on CARLA Leaderboard and Bench2Drive.
  • Supports Vision-Question-Answering (VQA), driving commentary, and instruction following.
  • Introduces a novel "Action Dreaming" dataset for improved language-action alignment.

Maintenance & Community

The README provides no direct links to community channels (Discord, Slack) or a roadmap. Maintenance status is uncertain, with a note indicating potential future cleanup of evaluation scripts.

Licensing & Compatibility

The repository's license is not specified in the README, making its terms for use, modification, and distribution unclear. Commercial use compatibility is therefore undetermined.

Limitations & Caveats

The released model and dataset are reproductions, leading to slight deviations from original paper results. Language evaluation scripts may be subject to future cleanup. Data generation scripts for VQA and commentary are tightly coupled to specific simulator state information, limiting their reusability with custom datasets. The Bench2Drive benchmark is noted as a "training" benchmark due to potential data leakage.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
27 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.