Multimodal model for augmenting generative language models
Top 64.0% on sourcepulse
MAGMA is a multimodal model that enables generative language models to understand and process combinations of images and text. It is designed for researchers and developers working on vision-language tasks, offering a simplified approach to multimodal integration and achieving state-of-the-art results on benchmarks like OKVQA.
How It Works
MAGMA employs adapter-based finetuning to augment existing generative language models with visual understanding. Crucially, it keeps the core language model weights frozen, preserving its pre-trained knowledge and in-context learning abilities. This adapter-only approach simplifies the pre-training process, allowing for end-to-end training with a single language modeling objective, making it more efficient than multi-step methods.
Quick Start & Requirements
pip install git+https://github.com/Aleph-Alpha/magma.git
wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/master/configs/MAGMA_v1.yml
Highlighted Details
Maintenance & Community
The project is associated with Aleph Alpha and researchers from Heidelberg University. Further details on the latest models and services can be found on the Aleph Alpha website.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
The freely available model is a demo; for advanced capabilities, users are directed to Aleph Alpha's commercial offerings. Training MAGMA from scratch requires manually downloading and integrating GPT-J weights.
1 week ago
1+ week