SenseNova-SI by OpenSenseNova

Scaling multimodal models to achieve advanced spatial intelligence

Created 8 months ago

282 stars

Top 92.2% on SourcePulse

Project Summary

SenseNova-SI: Scaling Spatial Intelligence with Multimodal Foundation Models

This project addresses the limitations of current multimodal foundation models in spatial intelligence by introducing the SenseNova-SI family. It offers researchers and practitioners enhanced capabilities in understanding and generating spatial information, leveraging large-scale, curated datasets and established multimodal architectures. The primary benefit is achieving state-of-the-art performance on diverse spatial intelligence benchmarks while maintaining general multimodal understanding.

How It Works

SenseNova-SI scales existing multimodal foundation models, such as Qwen3-VL, InternVL3, and Bagel, by training them on a meticulously curated dataset named SenseNova-SI-8M. This dataset comprises approximately 8.16 million diverse samples derived from 151 sources, systematically covering a broad taxonomy of spatial capabilities. This data-centric approach aims to cultivate robust spatial reasoning and generalization.

Quick Start & Requirements

Installation: Clone the repository and use uv for environment synchronization: uv sync --extra cu124 (or other CUDA versions like cu118, cu121, cu126, etc.).
Prerequisites: CUDA Toolkit (version specified during uv sync), Python (3.10+ recommended via conda environments).
Links:
- uv installation: https://docs.astral.sh/uv/getting-started/installation/#installing-uv
- Examples: example.py, example_bagel.py
- Evaluation framework: EASI (details not provided in README)

Highlighted Details

SenseNova-SI-1.3-InternVL3-8B achieves state-of-the-art performance across eight benchmarks including VSI, MMSI, MindCube, ViewSpatial, SITE, BLINK, 3DSRBench, and EmbSpatial-Bench.
Models demonstrate strong performance in specific tasks: SenseNova-SI-1.4-InternVL3-8B excels in grounding (89.21 avg RefCOCO) and depth estimation (95.56 relative depth), while SenseNova-SI-1.5-InternVL3-8B shows improved solid geometry reasoning (63.5 SolidGeo MCQ accuracy).
The full-scale training dataset, SenseNova-SI-8M (~8.16 million samples), has been released to facilitate further research.
The project includes analyses on data scaling, emergent generalization, and spatial chain-of-thought reasoning.

Maintenance & Community

SenseNova-SI is described as an "ongoing project" with continuous updates planned. Newly trained models are publicly released. Future integration with larger in-house models is anticipated. Specific community links (Discord, Slack) or a public roadmap are not detailed in the README.

Licensing & Compatibility

The project incorporates code from BAGEL, InternVL, and lmms-engine, and directs users to consult the original repositories for licensing details. No explicit license is stated for the SenseNova-SI project itself. Compatibility for commercial use or closed-source linking would depend on the licenses of the underlying base models.

Limitations & Caveats

As an ongoing project, SenseNova-SI is subject to continuous development and updates. The reliance on external base model licenses means users must verify compatibility for their specific use cases. Future integration plans suggest current models may evolve or be superseded.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days