UniVTG  by showlab

Video-language temporal grounding model

created 2 years ago
357 stars

Top 79.4% on sourcepulse

GitHubView on GitHub
Project Summary

UniVTG is a novel video-language temporal grounding pretraining model designed to unify diverse temporal annotations. It addresses moment retrieval, highlight detection, and video summarization, targeting researchers and practitioners in video understanding and multimodal AI. The primary benefit is a unified framework that enhances performance across various temporal grounding tasks.

How It Works

UniVTG employs a unified pretraining strategy that leverages diverse temporal annotations (interval, curve, point) to build a robust video-language understanding model. This approach allows the model to learn a generalized representation of temporal relationships within videos, enabling it to adapt to different grounding granularities without task-specific architectural changes.

Quick Start & Requirements

  • Install: Follow instructions in install.md.
  • Prerequisites: Python, PyTorch. Specific dependencies detailed in install.md.
  • Demo: Huggingface space demo available.
  • Resources: Can run on a single GPU with < 4GB memory for inference. Pretraining requires multi-GPU.
  • Links: arXiv, Huggingface Space Demo, Model Zoo

Highlighted Details

  • Achieves state-of-the-art results on QVHL, Charades, and NLQ benchmarks.
  • Efficient inference: < 1 second for 10-minute videos on a single GPU (< 4GB VRAM).
  • Supports scalable pseudo-annotation generation using CLIP teacher models.
  • Unified pretraining framework for diverse temporal grounding tasks.

Maintenance & Community

  • Maintained by Kevin (kevin.qh.lin@gmail.com).
  • Codebase based on moment_detr, HERO_Video_Feature_Extractor, UMT.
  • Open to questions and discussions via email or GitHub issues.

Licensing & Compatibility

  • License not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a "Todo" item to connect UniVTG with LLMs like ChatGPT, indicating this integration is not yet implemented. Training instructions are geared towards Slurm, potentially requiring adaptation for non-Slurm environments.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.