Multilingual-CLIP  by FreddeFrallan

Multilingual text encoders leveraging OpenAI's CLIP model

created 4 years ago
808 stars

Top 44.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides pre-trained CLIP text encoders for multiple languages, enabling cross-lingual and multilingual text-image retrieval. It's designed for researchers and developers working with multimodal AI who need to bridge language barriers in visual understanding tasks. The project offers a significant advantage by allowing users to leverage CLIP's powerful image-text matching capabilities across a wide array of languages without requiring extensive multilingual image-text datasets.

How It Works

The project adapts OpenAI's CLIP architecture by retraining its text encoder using a cross-lingual teacher learning approach. This method utilizes machine translation to align text from various languages with English CLIP embeddings, effectively transferring CLIP's visual-textual understanding to new languages. This approach bypasses the need for target-language image-text pairs, making it efficient and scalable for low-resource languages.

Quick Start & Requirements

  • Install: pip install multilingual-clip (or pip install tensorflow)
  • Requirements: Python >= 3.6.9, Transformers = 4.8.1. PyTorch or TensorFlow.
  • Demo: Live Demo
  • Colab Notebook: Multilingual_CLIP.ipynb

Highlighted Details

  • Offers multiple pre-trained models based on different transformer architectures (e.g., XLM-R, LaBSE) and vision models (e.g., ViT-L/14, ViT-B/32).
  • Provides benchmark results on MS-COCO for various languages, showing competitive performance.
  • Includes code for both PyTorch and TensorFlow inference, as well as TensorFlow training.
  • Supports a wide range of languages, with specific models fine-tuned for up to 109 languages.

Maintenance & Community

  • Contact: Fredrik.Carlsson@ri.se for questions.
  • Contributions are welcomed for new language models.
  • Acknowledgements include Stability.ai for compute resources.

Licensing & Compatibility

  • License: MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

Legacy models require separate download of linear weights and may not be as integrated as newer Huggingface-hosted models. Performance can vary across languages, particularly for those with less representation in the training data.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.