Video-XL  by VectorSpaceLab

VLM for hour-scale video understanding (research paper)

Created 1 year ago
548 stars

Top 58.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides Video-XL, a family of efficient Vision-Language Models (VLMs) designed for understanding extremely long videos, including hour-scale content. It targets researchers and practitioners in video analysis and multimodal AI, offering a novel approach to handle extended temporal data.

How It Works

Video-XL employs a reconstructive token compression strategy to efficiently process thousands of video frames. This method, detailed in Video-XL-Pro, reduces the computational and memory footprint, enabling models with fewer parameters (e.g., 3B) to achieve strong performance on long-form video understanding tasks.

Quick Start & Requirements

  • Installation: Codebase is provided; specific installation commands are not detailed in the README.
  • Prerequisites: Requires access to model weights and potentially large datasets for training/evaluation. Specific hardware requirements (e.g., 80GB GPU for Video-XL-Pro) are mentioned.
  • Resources: Video-XL-Pro can process 10,000 frames on an 80GB GPU.
  • Links:

Highlighted Details

  • Achieves hour-scale video understanding capabilities.
  • Video-XL-Pro processes 10,000 frames on an 80GB GPU with a 3B parameter model.
  • Selected for Oral presentation at CVPR 2025.
  • Training data for Video-XL-Pro is released.

Maintenance & Community

  • Project is actively developed with recent updates in April 2025.
  • Mentions CVPR 2025 acceptance.
  • No explicit community links (Discord, Slack) are provided in the README.

Licensing & Compatibility

  • Project content is licensed under Apache License 2.0.
  • Utilizes datasets and checkpoints subject to their original licenses; users must comply with these.
  • Apache 2.0 is generally permissive for commercial use and closed-source linking.

Limitations & Caveats

The README indicates that specific datasets and checkpoints have their own licensing terms, which users must adhere to, potentially creating compatibility complexities. Detailed installation and usage instructions beyond the core concepts are not fully elaborated.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
24 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), and
2 more.

HunyuanVideo by Tencent-Hunyuan

0.2%
11k
PyTorch code for video generation research
Created 9 months ago
Updated 3 weeks ago
Feedback? Help us improve.