platonic-rep  by minyoungg

Research paper project for vision-language alignment score computation

Created 1 year ago
615 stars

Top 53.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides tools and precomputed features for evaluating the Platonic Representation Hypothesis, which posits that vision and language models learn similar representations. It is targeted at researchers and practitioners in computer vision and natural language processing who want to quantify the alignment between these modalities. The project enables users to measure and compare vision-language alignment scores across various models.

How It Works

The core approach involves extracting hidden states from specified layers of both vision and language models, pooling these features, and then computing alignment scores using various metrics like mutual k-NN. This method allows for a quantitative assessment of how similarly different models represent concepts across modalities, with precomputed features simplifying the process.

Quick Start & Requirements

  • Install via pip: pip install -e .
  • Requirements: Python 3.11, PyTorch 2.2.0. Install others with pip install -r requirements.txt.
  • Precomputed features are available for download.
  • Examples for vision and language model feature extraction and scoring are provided.

Highlighted Details

  • Supports multiple alignment metrics including mutual_knn, cycle_knn, cka, and svcca.
  • Allows scoring custom models by providing extracted features.
  • Precomputed features for supported datasets are automatically downloaded.
  • Custom models can be added by modifying tasks.py, supporting Hugging Face autoregressive LLMs and ViT models.

Maintenance & Community

The project is associated with the paper "The Platonic Representation Hypothesis" presented at ICML 2024. No specific community channels or active maintenance signals are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and commercial use or closed-source linking compatibility is not specified.

Limitations & Caveats

Currently, only autoregressive language models and ViT vision architectures are supported. The project notes that download URLs for precomputed features may occasionally be down, in which case users can recompute them. Alignment scores may vary due to precision and batch-size differences.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.1%
11k
Library for language-vision AI research
Created 3 years ago
Updated 11 months ago
Feedback? Help us improve.