VLog by showlab

Video-language model via generative retrieval of narration vocabulary

Created 3 years ago

588 stars

Top 54.6% on SourcePulse

Project Summary

VLog introduces two novel approaches to video-language understanding: treating video narration as a vocabulary problem and viewing videos as long documents for LLM interaction. This project targets researchers and developers in computer vision and natural language processing, offering new methods for detailed video analysis and conversational interaction with video content.

How It Works

The "Video Narration as Vocabulary" approach utilizes a GPT2-based video narrator that employs Generative Retrieval to create a narration vocabulary. This method aims for efficient and comprehensive video narration. The "Video as Long Document" approach transforms a video into a textual document encompassing both visual and audio information, enabling LLMs to engage in conversational analysis of the video content.

Highlighted Details

Presents two distinct methodologies for video-language understanding.
"Video Narration as Vocabulary" uses Generative Retrieval with a GPT2-based narrator.
"Video as Long Document" enables LLM-based chat over video content by converting video to text.

Maintenance & Community

This project is associated with CVPR 2025. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The licensing information is not specified in the provided README.

Limitations & Caveats

The project is presented as a CVPR 2025 submission, suggesting it may be in a research or pre-publication phase. Specific implementation details, performance benchmarks, and compatibility requirements are not detailed in this summary.

VLog by showlab

Explore Similar Projects

ShareGPT4Video by ShareGPT4Omni

LaViLa by facebookresearch

clipify by louisedesadeleer

Clip-Anything by SamurAIGPT

collaborative-experts by albanie

youtube-gpt by davila7

video-recap-skills by worldwonderer

Auto-YouTube-Shorts-Maker by Binary-Bytes

MiniGPT4-video by Vision-CAIR

VideoLLaMA2 by DAMO-NLP-SG

claude-real-video by HUANGCHIHHUNGLeo

auto-video-generateor by kuangdd2024