InternVideo  by OpenGVLab

Video foundation models & data for multimodal understanding (research paper)

created 2 years ago
1,984 stars

Top 22.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a suite of video foundation models and datasets designed for multimodal understanding and generation. Targeting researchers and developers in computer vision and AI, it offers scalable models and large-scale datasets to advance video-centric AI capabilities.

How It Works

The InternVideo series employs a dual approach of generative and discriminative learning to build comprehensive video understanding models. InternVideo2 scales these models for multimodal tasks, while InternVideo2.5 enhances context modeling for longer, richer video content. The project also includes InternVid, a large-scale video-text dataset, facilitating both understanding and generation tasks.

Quick Start & Requirements

  • Installation and usage details are available in the official documentation.
  • Requires Python and relevant deep learning libraries. Specific hardware requirements (e.g., GPUs) may apply depending on the model size.
  • Links: Official Documentation, HuggingFace Models

Highlighted Details

  • Offers a range of model sizes, including smaller distilled versions like InternVideo2-S/B/L and larger 8B parameter models.
  • Includes InternVid, a large-scale video-text dataset with 230 million video-text pairs.
  • Supports video instruction tuning for multimodal dialogue systems like VideoChat.
  • Models and datasets are available on HuggingFace.

Maintenance & Community

  • Actively updated with new releases like InternVideo2.5.
  • Community discussion via WeChat groups.
  • Hiring for researchers and engineers in video foundation models.

Licensing & Compatibility

  • The specific license is not explicitly stated in the provided README snippet. Users should verify licensing terms for commercial use or integration into closed-source projects.

Limitations & Caveats

  • The README does not explicitly detail licensing, which may impact commercial adoption. Specific hardware requirements for larger models are not detailed.
Health Check
Last commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
4
Star History
158 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.