lens by ContextualAI

Vision-language research paper using LLMs

Created 2 years ago

356 stars

Top 78.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Douwe Kiela

Cofounder of Contextual AI

Amanpreet Singh

Cofounder of Contextual AI

Project Summary

LENS (Large Language Models Enhanced to See) provides a system for leveraging large language models (LLMs) for computer vision tasks by first generating rich natural language descriptions of images. This approach targets researchers and developers seeking to integrate vision capabilities into LLMs without requiring model fine-tuning, offering competitive performance against state-of-the-art models.

How It Works

LENS processes images through a suite of vision modules that output detailed natural language captions, tags, objects, and attributes. These textual descriptions are then fed into an LLM, enabling it to perform various vision-related tasks. This method avoids the need for fine-tuning LLMs on visual data, simplifying integration and potentially improving performance through the LLM's inherent language understanding capabilities.

Quick Start & Requirements

Install via pip: pip install llm-lens
Recommended: Machine with GPUs and CUDA. CPU-only is functional but slower for large datasets.
Python 3.9 environment.
Official Demo: [Demo]
Official Colab: [Colab]

Highlighted Details

Generates natural language descriptions for images to be used as input for LLMs.
Achieves competitive performance against models like Flamingo, CLIP, and Kosmos without LLM fine-tuning.
Supports augmenting Hugging Face datasets with visual descriptions.
Future additions include evaluation scripts and vocabulary generation for paper reproducibility.

Maintenance & Community

Project is associated with the paper "Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language" (arXiv:2306.16410).
Links to official blog and paper are provided.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial or closed-source use is undetermined.

Limitations & Caveats

The repository is marked as "Coming Soon" for several key features, including evaluation on standard datasets and reproduction scripts for the paper's methodology, indicating it may be in an early development stage.

Health Check

Last Commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days