Research paper for video understanding/generation via improved captions
Top 35.9% on sourcepulse
ShareGPT4Video provides an official implementation for improving video understanding and generation through enhanced video captions. It offers a large-scale dataset of 40K GPT-4 Vision-generated captions and a general video captioner, ShareCaptioner-Video, capable of handling diverse video formats. The project also releases ShareGPT4Video-8B, a large video-language model, and demonstrates its utility in improving text-to-video performance.
How It Works
The project leverages GPT-4 Vision to generate high-quality, descriptive captions for videos, addressing limitations in existing video-text datasets. ShareCaptioner-Video is designed as a versatile video captioning model, offering two inference modes for balancing quality and efficiency. The ShareGPT4Video-8B model is a large video-language model fine-tuned using these enhanced captions, aiming to improve video comprehension and generation capabilities.
Quick Start & Requirements
pip install -e .
and pip install -e ".[train]"
. flash-attn
is also recommended.python run.py --model-path Lin-Chen/sharegpt4video-8b --video examples/yoga.mp4 --query Describe this video in detail.
for direct use. Local demos are available via python app.py
for both models.Highlighted Details
Maintenance & Community
The project is associated with multiple research institutions and authors. Links to related works (MMStar, ShareGPT4V) and demos are available. The project page and dataset were released on May 26, 2024, with the ShareGPT4Video-8B model following on May 27, 2024.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, the project is built upon LLaVA and integrates with Open-Sora-Plan and Open-LLaVA-NeXT, which may have their own licensing terms. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The README mentions a "Todo" list that is now marked as completed, indicating active development and release of core features. No specific limitations or known bugs are detailed.
9 months ago
1 week