Research paper project for vision-language alignment score computation
Top 54.8% on SourcePulse
This project provides tools and precomputed features for evaluating the Platonic Representation Hypothesis, which posits that vision and language models learn similar representations. It is targeted at researchers and practitioners in computer vision and natural language processing who want to quantify the alignment between these modalities. The project enables users to measure and compare vision-language alignment scores across various models.
How It Works
The core approach involves extracting hidden states from specified layers of both vision and language models, pooling these features, and then computing alignment scores using various metrics like mutual k-NN. This method allows for a quantitative assessment of how similarly different models represent concepts across modalities, with precomputed features simplifying the process.
Quick Start & Requirements
pip install -e .
pip install -r requirements.txt
.Highlighted Details
mutual_knn
, cycle_knn
, cka
, and svcca
.tasks.py
, supporting Hugging Face autoregressive LLMs and ViT models.Maintenance & Community
The project is associated with the paper "The Platonic Representation Hypothesis" presented at ICML 2024. No specific community channels or active maintenance signals are mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The code is provided for research purposes, and commercial use or closed-source linking compatibility is not specified.
Limitations & Caveats
Currently, only autoregressive language models and ViT vision architectures are supported. The project notes that download URLs for precomputed features may occasionally be down, in which case users can recompute them. Alignment scores may vary due to precision and batch-size differences.
4 months ago
Inactive