Video-XL  by VectorSpaceLab

VLM for hour-scale video understanding (research paper)

created 10 months ago
507 stars

Top 62.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Video-XL, a family of efficient Vision-Language Models (VLMs) designed for understanding extremely long videos, including hour-scale content. It targets researchers and practitioners in video analysis and multimodal AI, offering a novel approach to handle extended temporal data.

How It Works

Video-XL employs a reconstructive token compression strategy to efficiently process thousands of video frames. This method, detailed in Video-XL-Pro, reduces the computational and memory footprint, enabling models with fewer parameters (e.g., 3B) to achieve strong performance on long-form video understanding tasks.

Quick Start & Requirements

  • Installation: Codebase is provided; specific installation commands are not detailed in the README.
  • Prerequisites: Requires access to model weights and potentially large datasets for training/evaluation. Specific hardware requirements (e.g., 80GB GPU for Video-XL-Pro) are mentioned.
  • Resources: Video-XL-Pro can process 10,000 frames on an 80GB GPU.
  • Links:

Highlighted Details

  • Achieves hour-scale video understanding capabilities.
  • Video-XL-Pro processes 10,000 frames on an 80GB GPU with a 3B parameter model.
  • Selected for Oral presentation at CVPR 2025.
  • Training data for Video-XL-Pro is released.

Maintenance & Community

  • Project is actively developed with recent updates in April 2025.
  • Mentions CVPR 2025 acceptance.
  • No explicit community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

  • Project content is licensed under Apache License 2.0.
  • Utilizes datasets and checkpoints subject to their original licenses; users must comply with these.
  • Apache 2.0 is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The README indicates that specific datasets and checkpoints have their own licensing terms, which users must adhere to, potentially creating compatibility complexities. Detailed installation and usage instructions beyond the core concepts are not fully elaborated.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
207 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.