QD-DETR by wjun0830

Query-dependent video representation for temporal grounding

Created 3 years ago

251 stars

Top 99.8% on SourcePulse

Project Summary

Summary QD-DETR offers the official PyTorch implementation for a CVPR 2023 paper, focusing on Query-Dependent Video Representation for moment retrieval and highlight detection. It provides researchers and practitioners with a framework to enhance temporal grounding in videos by making video representations query-aware.

How It Works Building on the Moment-DETR architecture, QD-DETR introduces a novel query-dependent video representation. This approach aims to improve localization accuracy by conditioning video features on the specific query. The system utilizes pre-computed video features (e.g., I3D) and supports multi-modal training (video+audio). Pretraining on ASR captions is also available to boost performance.

Quick Start & Requirements

Prerequisites: Python 3.7, PyTorch.
Installation: Clone repo, pip install -r requirements.txt.
Datasets: Requires downloading QVHighlights (8GB, moment_detr_features.tar.gz - link expired) and TVSum (69.1MB) features. Refer to CG-DETR GitHub for updated instructions.
Training: Scripts provided for QVHighlights (train.sh, train_audio.sh) and TVSum (tvsum/train_tvsum.sh).
Inference: inference.sh for generating submissions.
Pretraining/Finetuning: Scripts pretrain.sh and finetuning via train.sh --resume.
Links: Paper, Project Page, Video, CG-DETR.

Highlighted Details

Official PyTorch implementation for CVPR 2023 paper on Query-Dependent Video Representation.
Novel query-dependent video representation for temporal grounding.
Supports moment retrieval and highlight detection tasks.
Pre-trained checkpoints for QVHighlights (Video+Audio, Video only) are available.
Codebase is derived from the official Moment-DETR repository.

Maintenance & Community Maintained by authors WonJun Moon, SangEek Hyun, SangUk Park, Dongchan Park, and Jae-Pil Heo. Contact: wjun0830@gmail.com, hse1032@gmail.com. The project points to a newer, related project, CG-DETR, for updated research and instructions.

Licensing & Compatibility Code is released under the MIT license. Parts of the implementation and annotation files are borrowed from Moment-DETR.

Limitations & Caveats

Requires Python 3.7 (older version).
Dataset preparation involves large downloads and specific directory structures; a key feature download link is expired.
The README directs users to CG-DETR for updated instructions, suggesting QD-DETR may be superseded.
An update clarified experimental details (C3D vs. I3D features for Charades-STA), indicating potential for past confusion.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days