Discover and explore top open-source AI tools and projects—updated daily.
wjun0830Query-dependent video representation for temporal grounding
Top 99.8% on SourcePulse
Summary QD-DETR offers the official PyTorch implementation for a CVPR 2023 paper, focusing on Query-Dependent Video Representation for moment retrieval and highlight detection. It provides researchers and practitioners with a framework to enhance temporal grounding in videos by making video representations query-aware.
How It Works Building on the Moment-DETR architecture, QD-DETR introduces a novel query-dependent video representation. This approach aims to improve localization accuracy by conditioning video features on the specific query. The system utilizes pre-computed video features (e.g., I3D) and supports multi-modal training (video+audio). Pretraining on ASR captions is also available to boost performance.
Quick Start & Requirements
pip install -r requirements.txt.moment_detr_features.tar.gz - link expired) and TVSum (69.1MB) features. Refer to CG-DETR GitHub for updated instructions.train.sh, train_audio.sh) and TVSum (tvsum/train_tvsum.sh).inference.sh for generating submissions.pretrain.sh and finetuning via train.sh --resume.Highlighted Details
Maintenance & Community Maintained by authors WonJun Moon, SangEek Hyun, SangUk Park, Dongchan Park, and Jae-Pil Heo. Contact: wjun0830@gmail.com, hse1032@gmail.com. The project points to a newer, related project, CG-DETR, for updated research and instructions.
Licensing & Compatibility Code is released under the MIT license. Parts of the implementation and annotation files are borrowed from Moment-DETR.
Limitations & Caveats
10 months ago
Inactive