clip-pytorch by bubbliiiing

CLIP for transferable visual-language models

Created 4 years ago

255 stars

Top 98.8% on SourcePulse

Project Summary

Summary

This repository offers a PyTorch implementation of CLIP (Contrastive Language–Image Pre-training), designed to empower users in training transferable visual models directly from natural language supervision. It caters to researchers and developers aiming to adapt CLIP's capabilities to their specific, custom datasets, with explicit support for both Chinese and English languages. The project provides a practical framework for building and deploying bespoke vision-language models tailored for a wide array of downstream applications.

How It Works

The project meticulously implements CLIP within the PyTorch ecosystem, furnishing a comprehensive framework for training models on user-provided image-caption datasets. It features distinct, executable scripts for the entire lifecycle: training (train.py), inference/prediction (predict.py), and performance evaluation (eval.py). The core methodology revolves around contrastive learning, where image and text embeddings are learned simultaneously to align visual features with their corresponding natural language descriptions. This implementation emphasizes customizability, allowing users to move beyond standard pre-trained models and build upon foundational architectures referenced from established works like OpenAI's CLIP and Alibaba's AliceMind.

Quick Start & Requirements

Primary install/run command: Initiate training via train.py, perform inference with predict.py, and assess performance using eval.py.
Prerequisites: A stable installation of PyTorch version 1.7.1 or a more recent release.
Dependencies: Essential pre-trained weights and example datasets (such as Flickr8k) must be downloaded via provided Baidu Netdisk links.
Links:
- Pre-trained weights: https://pan.baidu.com/s/1b9Nt-UuqOJfhbhJYVyrK0g (Code: mfnc)
- Flickr8k dataset: https://pan.baidu.com/s/1UzaGmbEGz1BXZ0IXK1TT7g (Code: exg3)
- OpenAI CLIP: https://github.com/openai/CLIP
- AliceMind: https://github.com/alibaba/AliceMind

Highlighted Details

Explicitly supports the training of CLIP models on custom datasets, accommodating both Chinese and English language inputs.
Delivers a complete suite of scripts covering the entire workflow: model training, prediction/inference, and comprehensive evaluation.
Provides clear instructions and a defined JSON data format for preparing custom datasets, including image paths and multiple caption options.
Draws architectural inspiration and context from the foundational OpenAI CLIP and Alibaba AliceMind projects.

Maintenance & Community

The provided README does not contain specific information regarding project maintainers, community support channels (such as Discord or Slack), or a public roadmap, limiting visibility into project health and future development.

Licensing & Compatibility

Crucially, the README omits any mention of the project's software license. This lack of clarity prevents an assessment of its suitability for commercial applications or integration within proprietary, closed-source software.

Limitations & Caveats

The project's reliance on Baidu Netdisk for downloading critical assets like pre-trained weights and datasets may pose accessibility or long-term reliability concerns for users outside specific regions or those facing network restrictions.
The absence of a defined software license creates significant ambiguity regarding usage rights, particularly for commercial ventures or collaborative development.
Configuration for non-English datasets, such as Chinese, necessitates manual adjustments within the code (e.g., modifying the phi parameter), indicating a potential need for deeper technical understanding for multilingual use cases.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days