Discover and explore top open-source AI tools and projects—updated daily.
Video foundation models & data for multimodal understanding (research paper)
Top 21.7% on SourcePulse
This repository provides a suite of video foundation models and datasets designed for multimodal understanding and generation. Targeting researchers and developers in computer vision and AI, it offers scalable models and large-scale datasets to advance video-centric AI capabilities.
How It Works
The InternVideo series employs a dual approach of generative and discriminative learning to build comprehensive video understanding models. InternVideo2 scales these models for multimodal tasks, while InternVideo2.5 enhances context modeling for longer, richer video content. The project also includes InternVid, a large-scale video-text dataset, facilitating both understanding and generation tasks.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
1 week