Awesome-Spatial-VLMs by vulab-AI

A comprehensive resource for spatial intelligence in vision-language models

Created 1 year ago

530 stars

Top 58.9% on SourcePulse

Project Summary

This repository serves as a comprehensive, community-maintained resource for the survey paper "Spatial Intelligence in Vision-Language Models: A Comprehensive Survey." It addresses the need for structured organization and accessibility of research in the rapidly evolving field of spatial reasoning within VLMs. The project benefits researchers, engineers, and power users by providing a curated hub of papers, datasets, benchmarks, and evaluation tools, facilitating a deeper understanding and advancement of spatial VLM capabilities.

How It Works

The project systematically organizes spatial intelligence in VLMs through a defined 3-level cognitive hierarchy: L1 Perception of intrinsic 3D attributes, L2 Relational Understanding, and L3 Extrapolation. Methods are further categorized into five distinct families, mapping the current research landscape. It aggregates relevant datasets, benchmarks, and maintains a public leaderboard showcasing performance metrics for numerous VLMs.

Quick Start & Requirements

This repository functions as a curated knowledge base and evaluation framework rather than a deployable application. An official website is available for streamlined navigation. An evaluation toolkit, detailed in evaluation/README.md, supports benchmarking commercial, general-purpose, and specialized spatial VLMs using standardized protocols. Specific installation or hardware requirements are not detailed, but usage implies integration with existing VLM models.

Highlighted Details

Comprehensive catalog of over 300 research papers, datasets, and benchmarks related to spatial VLM intelligence.
Organized taxonomy based on a cognitive hierarchy (Perception, Relation, Extrapolation) and method families.
Main Leaderboard comparing 37+ VLMs across 9 benchmarks, with scores presented as QA Accuracy.
An open Evaluation Toolkit for reproducible benchmarking of new and existing VLM models.

Maintenance & Community

The repository is actively maintained and invites community contributions, including submissions for new papers, datasets, or models via pull requests or issues. Regular updates are expected. No specific community channels like Discord or Slack are listed.

Licensing & Compatibility

The provided README content does not specify a software license. This absence represents a significant gap for assessing compatibility with commercial or closed-source projects.

Limitations & Caveats

The repository's primary function is curation and evaluation; it does not provide pre-trained models or direct inference capabilities. The lack of explicit licensing information is a notable caveat for potential adopters.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days