Multilingual-CLIP by FreddeFrallan

Multilingual text encoders leveraging OpenAI's CLIP model

Created 4 years ago

824 stars

Top 43.1% on SourcePulse

View on GitHub

3 Experts Love This Project

Cofounder of Replicate

Project Summary

This repository provides pre-trained CLIP text encoders for multiple languages, enabling cross-lingual and multilingual text-image retrieval. It's designed for researchers and developers working with multimodal AI who need to bridge language barriers in visual understanding tasks. The project offers a significant advantage by allowing users to leverage CLIP's powerful image-text matching capabilities across a wide array of languages without requiring extensive multilingual image-text datasets.

How It Works

The project adapts OpenAI's CLIP architecture by retraining its text encoder using a cross-lingual teacher learning approach. This method utilizes machine translation to align text from various languages with English CLIP embeddings, effectively transferring CLIP's visual-textual understanding to new languages. This approach bypasses the need for target-language image-text pairs, making it efficient and scalable for low-resource languages.

Quick Start & Requirements

Install: pip install multilingual-clip (or pip install tensorflow)
Requirements: Python >= 3.6.9, Transformers = 4.8.1. PyTorch or TensorFlow.
Demo: Live Demo
Colab Notebook: Multilingual_CLIP.ipynb

Highlighted Details

Offers multiple pre-trained models based on different transformer architectures (e.g., XLM-R, LaBSE) and vision models (e.g., ViT-L/14, ViT-B/32).
Provides benchmark results on MS-COCO for various languages, showing competitive performance.
Includes code for both PyTorch and TensorFlow inference, as well as TensorFlow training.
Supports a wide range of languages, with specific models fine-tuned for up to 109 languages.

Maintenance & Community

Contact: Fredrik.Carlsson@ri.se for questions.
Contributions are welcomed for new language models.
Acknowledgements include Stability.ai for compute resources.

Licensing & Compatibility

License: MIT License.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Legacy models require separate download of linear weights and may not be as integrated as newer Huggingface-hosted models. Performance can vary across languages, particularly for those with less representation in the training data.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days