Youku-mPLUG by X-PLUG

Chinese video-language dataset and benchmarks for pre-training

Created 2 years ago

301 stars

Top 88.7% on SourcePulse

Project Summary

Youku-mPLUG provides a large-scale Chinese video-language dataset and benchmarks for pre-training and evaluating multimodal models. It targets researchers and developers working on video understanding and generation tasks, offering a substantial resource for advancing Chinese multimodal AI capabilities.

How It Works

The project introduces Youku-mPLUG, a 10 million video-text dataset curated from the Youku platform, emphasizing safety, diversity, and quality across 20 super categories and 45 specific categories. It also provides three downstream benchmarks: Video Category Prediction, Video-Text Retrieval, and Video Captioning. The core approach involves pre-training large language models (like GPT-3 1.3B/2.7B) on this dataset and fine-tuning them for specific video-language tasks.

Quick Start & Requirements

Installation: conda env create -f environment.yml, conda activate youku, pip install megatron_util==1.3.0 -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html. For caption evaluation: apt-get install default-jre.
Prerequisites: Python 3.10, Conda, megatron_util, Java Runtime Environment for evaluation. Pre-trained checkpoints for GPT-3 1.3B/2.7B and BloomZ-7B models are required and available via Modelscope and HuggingFace.
Resources: Pre-training requires significant computational resources (e.g., 8 GPUs, DeepSpeed, bf16). Inference with mPLUG-Video (BloomZ-7B) requires PyTorch and HuggingFace Transformers.
Links: Paper, Modelscope, mPLUG-Owl Repo, mPLUG-Video Checkpoint.

Highlighted Details

10 million high-quality Chinese video-text pairs.
Three distinct downstream benchmarks for comprehensive evaluation.
Supports pre-training with large models like GPT-3 and fine-tuning with mPLUG-Owl.
Inference example provided for video question answering using mPLUG-Video (BloomZ-7B).

Maintenance & Community

The project is associated with authors from various institutions, indicating academic backing. Links to community resources like Discord or Slack are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license for the dataset or the code. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

A specific bug in megatron_util requires manual replacement of an initialize.py file post-installation. The dataset and models are primarily focused on Chinese content. Licensing details for widespread adoption are absent.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days