CLIP-Chinese  by yangjianxin1

CLIP model for Chinese multimodal tasks

created 2 years ago
416 stars

Top 71.5% on sourcepulse

GitHubView on GitHub
Project Summary

CLIP-Chinese provides a pre-trained CLIP model specifically for Chinese language, enabling multimodal understanding tasks like image-text retrieval and similarity matching for Chinese users. It addresses the limitation of English-centric CLIP models by offering a ViT+BERT architecture trained on a large Chinese image-text dataset.

How It Works

The project implements a CLIP model with a ViT (Vision Transformer) encoder and a BERT-based text encoder. It utilizes a Locked-image Text (LiT) tuning strategy, freezing the ViT weights and training the BERT component on 1.4 million Chinese image-text pairs. This approach leverages OpenAI's CLIP ViT initialization and Mengzi's BERT pre-trained weights, aiming for efficient transfer learning and strong performance on Chinese multimodal tasks.

Quick Start & Requirements

  • Install: pip install transformers torch (specific versions: transformers==4.18.0, torch==1.12.0)
  • Prerequisites: Python 3.8, PyTorch.
  • Usage: Load pre-trained weights from Hugging Face (YeungNLP/clip-vit-bert-chinese-1M) using BertCLIPModel.from_pretrained and CLIPProcessor.from_pretrained.
  • Resources: Training requires significant computational resources (GPU recommended). Pre-trained models are available on Hugging Face.
  • Data: 1.4 million Chinese image-text pairs are available via the linked WeChat public account.

Highlighted Details

  • Offers pre-trained weights for the full CLIP model and a standalone BERT encoder.
  • Provides scripts for similarity calculation (image-text, text-text, image-image).
  • Demonstrates performance with example similarity scores.
  • Includes a data downloading script and configurable training parameters.

Maintenance & Community

  • Developed by yangjianxin1.
  • Pre-trained weights and data are shared via Hugging Face and a WeChat public account.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's license is not specified, which may impact commercial adoption. The README notes that the image encoder's capabilities are primarily inherited from OpenAI's CLIP due to weight freezing during LiT tuning.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.