Youku-mPLUG  by X-PLUG

Chinese video-language dataset and benchmarks for pre-training

created 2 years ago
299 stars

Top 90.0% on sourcepulse

GitHubView on GitHub
Project Summary

Youku-mPLUG provides a large-scale Chinese video-language dataset and benchmarks for pre-training and evaluating multimodal models. It targets researchers and developers working on video understanding and generation tasks, offering a substantial resource for advancing Chinese multimodal AI capabilities.

How It Works

The project introduces Youku-mPLUG, a 10 million video-text dataset curated from the Youku platform, emphasizing safety, diversity, and quality across 20 super categories and 45 specific categories. It also provides three downstream benchmarks: Video Category Prediction, Video-Text Retrieval, and Video Captioning. The core approach involves pre-training large language models (like GPT-3 1.3B/2.7B) on this dataset and fine-tuning them for specific video-language tasks.

Quick Start & Requirements

  • Installation: conda env create -f environment.yml, conda activate youku, pip install megatron_util==1.3.0 -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html. For caption evaluation: apt-get install default-jre.
  • Prerequisites: Python 3.10, Conda, megatron_util, Java Runtime Environment for evaluation. Pre-trained checkpoints for GPT-3 1.3B/2.7B and BloomZ-7B models are required and available via Modelscope and HuggingFace.
  • Resources: Pre-training requires significant computational resources (e.g., 8 GPUs, DeepSpeed, bf16). Inference with mPLUG-Video (BloomZ-7B) requires PyTorch and HuggingFace Transformers.
  • Links: Paper, Modelscope, mPLUG-Owl Repo, mPLUG-Video Checkpoint.

Highlighted Details

  • 10 million high-quality Chinese video-text pairs.
  • Three distinct downstream benchmarks for comprehensive evaluation.
  • Supports pre-training with large models like GPT-3 and fine-tuning with mPLUG-Owl.
  • Inference example provided for video question answering using mPLUG-Video (BloomZ-7B).

Maintenance & Community

The project is associated with authors from various institutions, indicating academic backing. Links to community resources like Discord or Slack are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license for the dataset or the code. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

A specific bug in megatron_util requires manual replacement of an initialize.py file post-installation. The dataset and models are primarily focused on Chinese content. Licensing details for widespread adoption are absent.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.