ShareGPT4Video by ShareGPT4Omni

Research paper for video understanding/generation via improved captions

Created 1 year ago

1,086 stars

Top 34.8% on SourcePulse

Project Summary

ShareGPT4Video provides an official implementation for improving video understanding and generation through enhanced video captions. It offers a large-scale dataset of 40K GPT-4 Vision-generated captions and a general video captioner, ShareCaptioner-Video, capable of handling diverse video formats. The project also releases ShareGPT4Video-8B, a large video-language model, and demonstrates its utility in improving text-to-video performance.

How It Works

The project leverages GPT-4 Vision to generate high-quality, descriptive captions for videos, addressing limitations in existing video-text datasets. ShareCaptioner-Video is designed as a versatile video captioning model, offering two inference modes for balancing quality and efficiency. The ShareGPT4Video-8B model is a large video-language model fine-tuned using these enhanced captions, aiming to improve video comprehension and generation capabilities.

Quick Start & Requirements

Install: pip install -e . and pip install -e ".[train]". flash-attn is also recommended.
Prerequisites: Python 3.10, PyTorch.
Usage: Run python run.py --model-path Lin-Chen/sharegpt4video-8b --video examples/yoga.mp4 --query Describe this video in detail. for direct use. Local demos are available via python app.py for both models.
Resources: Links to Colab, HuggingFace models, and demos are provided.

Highlighted Details

NeurIPS 2024 D&B track accepted paper.
40K GPT-4 Vision-generated video captions and ~400K implicit split captions.
ShareCaptioner-Video offers quality and efficiency inference modes.
ShareGPT4Video-8B model released, trained on 8xA100 GPUs for 5 hours.
Improves Text-to-Video performance, integrated with Open-Sora-Plan.

Maintenance & Community

The project is associated with multiple research institutions and authors. Links to related works (MMStar, ShareGPT4V) and demos are available. The project page and dataset were released on May 26, 2024, with the ShareGPT4Video-8B model following on May 27, 2024.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, the project is built upon LLaVA and integrates with Open-Sora-Plan and Open-LLaVA-NeXT, which may have their own licensing terms. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a "Todo" list that is now marked as completed, indicating active development and release of core features. No specific limitations or known bugs are detailed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days