Discover and explore top open-source AI tools and projects—updated daily.
kohjingyuMultimodal model for grounding language models to images
Top 63.7% on SourcePulse
This repository provides code and model weights for FROMAGe, a system that grounds language models to images for multimodal inputs and outputs, as presented in an ICML 2023 paper. It enables text-to-image retrieval and image-conditioned text generation, targeting researchers and practitioners in multimodal AI.
How It Works
FROMAGe integrates visual information into large language models (LLMs) by adding trainable linear layers and a special "[RET]" embedding. This approach allows the LLM to condition its output on image content without requiring extensive retraining of the base LLM. The system leverages precomputed visual embeddings for efficient image retrieval.
Quick Start & Requirements
pip install -r requirements.txtexport PYTHONPATH=$PYTHONPATH:/path/to/fromage/fromage_model/.FROMAGe_example_notebook.ipynb.dataset/ directory.Highlighted Details
fromage_vis4) with 4 visual tokens for improved dialogue performance.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
export NCCL_P2P_DISABLE=1) for GPUs with less memory or if encountering issues.2 years ago
Inactive
kohjingyu
Aleph-Alpha-Research
unum-cloud
rmokady
mlfoundations
salesforce