platonic-rep by minyoungg

Research paper project for vision-language alignment score computation

Created 1 year ago

634 stars

Top 52.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

This project provides tools and precomputed features for evaluating the Platonic Representation Hypothesis, which posits that vision and language models learn similar representations. It is targeted at researchers and practitioners in computer vision and natural language processing who want to quantify the alignment between these modalities. The project enables users to measure and compare vision-language alignment scores across various models.

How It Works

The core approach involves extracting hidden states from specified layers of both vision and language models, pooling these features, and then computing alignment scores using various metrics like mutual k-NN. This method allows for a quantitative assessment of how similarly different models represent concepts across modalities, with precomputed features simplifying the process.

Quick Start & Requirements

Install via pip: pip install -e .
Requirements: Python 3.11, PyTorch 2.2.0. Install others with pip install -r requirements.txt.
Precomputed features are available for download.
Examples for vision and language model feature extraction and scoring are provided.

Highlighted Details

Supports multiple alignment metrics including mutual_knn, cycle_knn, cka, and svcca.
Allows scoring custom models by providing extracted features.
Precomputed features for supported datasets are automatically downloaded.
Custom models can be added by modifying tasks.py, supporting Hugging Face autoregressive LLMs and ViT models.

Maintenance & Community

The project is associated with the paper "The Platonic Representation Hypothesis" presented at ICML 2024. No specific community channels or active maintenance signals are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes, and commercial use or closed-source linking compatibility is not specified.

Limitations & Caveats

Currently, only autoregressive language models and ViT vision architectures are supported. The project notes that download URLs for precomputed features may occasionally be down, in which case users can recompute them. Alignment scores may vary due to precision and batch-size differences.

Health Check

Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days