magma  by Aleph-Alpha-Research

Multimodal model for augmenting generative language models

created 3 years ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MAGMA is a multimodal model that enables generative language models to understand and process combinations of images and text. It is designed for researchers and developers working on vision-language tasks, offering a simplified approach to multimodal integration and achieving state-of-the-art results on benchmarks like OKVQA.

How It Works

MAGMA employs adapter-based finetuning to augment existing generative language models with visual understanding. Crucially, it keeps the core language model weights frozen, preserving its pre-trained knowledge and in-context learning abilities. This adapter-only approach simplifies the pre-training process, allowing for end-to-end training with a single language modeling objective, making it more efficient than multi-step methods.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/Aleph-Alpha/magma.git
  • Requires PyTorch (>= 1.9.0) and Torchvision.
  • Download configuration: wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/master/configs/MAGMA_v1.yml
  • Official documentation and examples are available on the GitHub repository.

Highlighted Details

  • Outperforms Frozen on open-ended generative tasks.
  • Achieves state-of-the-art on the OKVQA benchmark.
  • Pretrained on significantly fewer samples compared to SimVLM.
  • Supports arbitrary combinations of visual and textual input for autoregressive text generation.

Maintenance & Community

The project is associated with Aleph Alpha and researchers from Heidelberg University. Further details on the latest models and services can be found on the Aleph Alpha website.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The freely available model is a demo; for advanced capabilities, users are directed to Aleph Alpha's commercial offerings. Training MAGMA from scratch requires manually downloading and integrating GPT-J weights.

Health Check
Last commit

1 week ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.