MAGIC by yxuansu

Framework for image-guided text generation using language models

Created 3 years ago

259 stars

Top 97.9% on SourcePulse

Project Summary

MAGIC is a training-free framework for integrating visual controls into text generation, enabling language models to perform multimodal tasks like image captioning and visually grounded story generation in a zero-shot manner. It targets researchers and developers working with large language models who need to incorporate visual grounding without extensive retraining. The primary benefit is achieving image-guided text generation with significant speedups over state-of-the-art methods.

How It Works

MAGIC combines an off-the-shelf language model (e.g., GPT-2) with an image-text matching model (e.g., CLIP). During decoding, it introduces a "magic score" derived from CLIP's embeddings. This score regularizes the language model's output to be semantically related to a given image while maintaining contextual coherence. This approach is advantageous as it's a plug-and-play solution, requiring no gradient updates or model fine-tuning, making it computationally efficient.

Quick Start & Requirements

Install dependencies: pip3 install -r requirements.txt
Requires Python 3.8.
Official demo available on Replicate: https://replicate.com/lucidrains/magic

Highlighted Details

Achieves a nearly 27x decoding speedup for zero-shot image captioning compared to state-of-the-art.
Outperforms existing methods on zero-shot image captioning tasks.
Demonstrates capability in visually grounded story generation.
Compatible with various text generation tasks requiring image grounding.

Maintenance & Community

Publicly released on May 6, 2022.
Contact: yxuansu@cam.ac.uk
Replicate provides a user-friendly demo.

Licensing & Compatibility

The README does not explicitly state a license. Code dependencies may impose their own licenses.

Limitations & Caveats

The repository does not specify a license, which may impact commercial use or integration into closed-source projects.
Requires specific versions of Python (3.8) and potentially specific versions of libraries due to the requirements.txt file.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days