TimeChat  by RenShuhuai-Andy

Multimodal LLM for long video understanding research paper

created 1 year ago
384 stars

Top 75.6% on sourcepulse

GitHubView on GitHub
Project Summary

TimeChat is a multimodal large language model designed for understanding long videos, focusing on temporal aspects. It targets researchers and developers working with video analysis and aims to provide accurate temporal localization, dense captioning, and highlight detection by integrating timestamp information directly into the model's architecture.

How It Works

TimeChat employs a timestamp-aware frame encoder to bind visual content with its corresponding timestamp. A sliding video Q-Former generates a variable-length video token sequence, allowing the model to efficiently process videos of diverse durations. This approach enables a more nuanced understanding of temporal relationships within video content.

Quick Start & Requirements

  • Install: Create a conda environment using environment.yml, activate it (conda activate timechat), and install PyTorch with CUDA 11.3 support (pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113).
  • Prerequisites: Requires ffmpeg, pre-trained EVA ViT-g, InstructBLIP Q-Former, LLaMA-2-7B, and Video-LLaMA-2-7B checkpoints.
  • Resources: Instruction-tuning requires 8x V100 (32G) GPUs; inference requires 1x A100 (40G/80G) or A6000.
  • Demo: A Jupyter Notebook demo is available.

Highlighted Details

  • Fine-tuned checkpoint (TimeChat-7b) released, based on LLaMA-2 7B.
  • Released TimeIT dataset (104K instances) for time-sensitive instruction tuning.
  • Zero-shot evaluation results on benchmarks like VideoMME, MVBench, and TempCompass are available.
  • Supports tasks including temporal localization, dense video captioning, and video highlight detection.

Maintenance & Community

The project is associated with CVPR 2024. Links to FAQ and evaluation details are provided.

Licensing & Compatibility

The project is intended for non-commercial research use only.

Limitations & Caveats

The model is released as a research preview and is strictly prohibited for illegal, harmful, violent, racist, or sexual purposes.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.