X-Temporal by Sense-X

Video understanding codebase using PyTorch

Created 6 years ago

445 stars

Top 67.3% on SourcePulse

Project Summary

This repository provides a PyTorch-based codebase for state-of-the-art video understanding tasks, targeting researchers and engineers. It simplifies the implementation and evaluation of various video classification models, offering a high-performance, modular design for rapid experimentation with novel research ideas.

How It Works

X-Temporal supports multiple input formats, including raw videos, RGB frames, and optical flow frames, and is designed to handle both single-label and multi-label datasets. Its modular architecture allows for easy integration and comparison of popular video understanding frameworks like SlowFast, R(2+1)D, R3D, TSN, and TSM.

Quick Start & Requirements

Install: Clone the repository and run ./easy_setup.sh.
Prerequisites: PyTorch 1.0+, TensorboardX, tqdm, scikit-learn, decord. FFmpeg is recommended for frame extraction.
Data Format: Supports meta files with video path, frame count, and category ID, or direct reading of original video files using decord. Multi-label datasets require categories separated by commas.
Resources: Setup involves cloning and running a script; training and testing require GPU resources and dataset preparation.
Links: Challenge Website

Highlighted Details

Implements SOTA video understanding methods including TSN, TIN, TSM, R(2+1)D, R3D, and SlowFast.
Supports popular datasets like Kinetics, Something2Something, and Multi-Moments in Time.
Achieved 1st place in the ICCV19-Multi Moments in Time Challenge.
Offers flexibility in input data types (raw video, frames, flow) and label types (single/multi-label).

Maintenance & Community

Maintained by Hao Shao, ManYuan Zhang, and Yu Liu. The project was released in August 2020.

Licensing & Compatibility

Released under the MIT license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The codebase was last updated in August 2020, and its compatibility with the latest PyTorch versions or newer SOTA models is not guaranteed.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

MiraData by mira-space

Video dataset for long video generation research

Created 1 year ago

Updated 1 year ago

VideoChat-Flash by OpenGVLab

Video modeling research paper with hierarchical compression for long contexts

Created 1 year ago

Updated 1 month ago

ml-slowfast-llava by apple

Video understanding and reasoning with a training-free LLM

Created 1 year ago

Updated 1 year ago

dolphin by kaleido-lab

Video interaction platform based on LLMs

Created 2 years ago

Updated 2 years ago

VideoGPT-plus by mbzuai-oryx

Video-language model integrating image/video encoders for enhanced video understanding

Created 1 year ago

Updated 5 months ago

tarsier by bytedance

Video-language model for high-quality video descriptions and video understanding

Created 1 year ago

Updated 5 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

VideoTuna by VideoVerses

Codebase for text-to-video applications

Created 1 year ago

Updated 3 months ago

Allegro by rhymes-ai

Text-to-video model for generating short, high-quality videos

Created 1 year ago

Updated 11 months ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

MiniGPT4-video by Vision-CAIR

Video-language model for short and long video understanding

Created 1 year ago

Updated 1 year ago

grounded-video-description by facebookresearch

Code for video grounding and captioning research paper

Created 6 years ago

Updated 4 years ago

VideoLLaMA3 by DAMO-NLP-SG

Multimodal foundation model for image/video understanding

Created 11 months ago

Updated 5 months ago

VideoPipe by sherlockchou86

Cross-platform C++ framework for video analysis and structuring

Created 3 years ago

Updated 2 months ago

Feedback? Help us improve.