Discover and explore top open-source AI tools and projects—updated daily.
huangb23Video LLM for fine-grained video moment understanding
Top 90.6% on SourcePulse
VTimeLLM is a PyTorch implementation for fine-grained video moment understanding and temporal reasoning, targeting researchers and developers in video-language modeling. It offers enhanced temporal awareness and intent alignment for LLMs processing video content.
How It Works
VTimeLLM employs a novel boundary-aware three-stage training strategy. It first aligns features using image-text pairs, then enhances temporal-boundary awareness with multi-event videos and temporal QA, and finally refines temporal understanding and human intent alignment through instruction tuning on high-quality dialogue datasets. This approach aims to outperform existing Video LLMs in fine-grained temporal tasks.
Quick Start & Requirements
pip install -r requirements.txt within a conda environment (python=3.10).pip install ninja flash-attn --no-build-isolation.offline_demo.md.train.md.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is released under a non-commercial license, restricting its use in commercial products. Specific hardware requirements for training or running the models are not detailed in the README.
1 year ago
1 day
baaivision