Long-CLIP by beichenzbc

Research paper code for extending CLIP's text input length

Created 1 year ago

885 stars

Top 40.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Jesse Clark

Cofounder of Marqo

Project Summary

Long-CLIP enhances CLIP's ability to process extended text inputs, addressing limitations in understanding lengthy descriptions for vision-language tasks. It targets researchers and developers working with long-form text and image data, offering improved performance in retrieval and classification.

How It Works

Long-CLIP modifies the standard CLIP architecture to accommodate longer text sequences, increasing the maximum input length from 77 to 248 tokens. This is achieved through architectural adjustments that enable efficient processing of extended textual context, leading to a reported 20% improvement in R@5 for long-caption text-image retrieval and a 6% improvement in traditional retrieval tasks.

Quick Start & Requirements

Install: Clone the repository and install CLIP dependencies.
Prerequisites: PyTorch, Pillow, and CLIP. CUDA is recommended for GPU acceleration.
Usage: Download checkpoints (longclip-B.pt, longclip-L.pt) and place them in ./checkpoints. Refer to the provided Python snippet for inference.
Evaluation: Scripts are available for zero-shot classification (ImageNet, CIFAR) and text-image retrieval (COCO2017, Flickr30k).
Training: Details are in train/train.md.
Demos: Available for Long-CLIP-SDXL integration and long-caption retrieval.

Highlighted Details

Increases CLIP's maximum input length from 77 to 248 tokens.
Achieves 20% R@5 improvement on long-caption retrieval and 6% on traditional retrieval.
Fine-tuning takes approximately 0.5 hours on 8 GPUs.
Offers plug-and-play integration with existing CLIP-based workflows, including SDXL.

Maintenance & Community

The project is associated with ECCV 2024. The README indicates active development with recent updates and bug fixes. Community channels are not explicitly mentioned.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.

Limitations & Caveats

The README does not specify the license, which may impact commercial adoption. While presented as plug-and-play, specific integration details for various CLIP-based applications beyond SDXL might require further investigation.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days