QD-DETR  by wjun0830

Query-dependent video representation for temporal grounding

Created 3 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary QD-DETR offers the official PyTorch implementation for a CVPR 2023 paper, focusing on Query-Dependent Video Representation for moment retrieval and highlight detection. It provides researchers and practitioners with a framework to enhance temporal grounding in videos by making video representations query-aware.

How It Works Building on the Moment-DETR architecture, QD-DETR introduces a novel query-dependent video representation. This approach aims to improve localization accuracy by conditioning video features on the specific query. The system utilizes pre-computed video features (e.g., I3D) and supports multi-modal training (video+audio). Pretraining on ASR captions is also available to boost performance.

Quick Start & Requirements

  • Prerequisites: Python 3.7, PyTorch.
  • Installation: Clone repo, pip install -r requirements.txt.
  • Datasets: Requires downloading QVHighlights (8GB, moment_detr_features.tar.gz - link expired) and TVSum (69.1MB) features. Refer to CG-DETR GitHub for updated instructions.
  • Training: Scripts provided for QVHighlights (train.sh, train_audio.sh) and TVSum (tvsum/train_tvsum.sh).
  • Inference: inference.sh for generating submissions.
  • Pretraining/Finetuning: Scripts pretrain.sh and finetuning via train.sh --resume.
  • Links: Paper, Project Page, Video, CG-DETR.

Highlighted Details

  • Official PyTorch implementation for CVPR 2023 paper on Query-Dependent Video Representation.
  • Novel query-dependent video representation for temporal grounding.
  • Supports moment retrieval and highlight detection tasks.
  • Pre-trained checkpoints for QVHighlights (Video+Audio, Video only) are available.
  • Codebase is derived from the official Moment-DETR repository.

Maintenance & Community Maintained by authors WonJun Moon, SangEek Hyun, SangUk Park, Dongchan Park, and Jae-Pil Heo. Contact: wjun0830@gmail.com, hse1032@gmail.com. The project points to a newer, related project, CG-DETR, for updated research and instructions.

Licensing & Compatibility Code is released under the MIT license. Parts of the implementation and annotation files are borrowed from Moment-DETR.

Limitations & Caveats

  • Requires Python 3.7 (older version).
  • Dataset preparation involves large downloads and specific directory structures; a key feature download link is expired.
  • The README directs users to CG-DETR for updated instructions, suggesting QD-DETR may be superseded.
  • An update clarified experimental details (C3D vs. I3D features for Charades-STA), indicating potential for past confusion.
Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.