magma  by Aleph-Alpha-Research

Multimodal model for augmenting generative language models

Created 4 years ago
489 stars

Top 63.1% on SourcePulse

GitHubView on GitHub
Project Summary

MAGMA is a multimodal model that enables generative language models to understand and process combinations of images and text. It is designed for researchers and developers working on vision-language tasks, offering a simplified approach to multimodal integration and achieving state-of-the-art results on benchmarks like OKVQA.

How It Works

MAGMA employs adapter-based finetuning to augment existing generative language models with visual understanding. Crucially, it keeps the core language model weights frozen, preserving its pre-trained knowledge and in-context learning abilities. This adapter-only approach simplifies the pre-training process, allowing for end-to-end training with a single language modeling objective, making it more efficient than multi-step methods.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/Aleph-Alpha/magma.git
  • Requires PyTorch (>= 1.9.0) and Torchvision.
  • Download configuration: wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/master/configs/MAGMA_v1.yml
  • Official documentation and examples are available on the GitHub repository.

Highlighted Details

  • Outperforms Frozen on open-ended generative tasks.
  • Achieves state-of-the-art on the OKVQA benchmark.
  • Pretrained on significantly fewer samples compared to SimVLM.
  • Supports arbitrary combinations of visual and textual input for autoregressive text generation.

Maintenance & Community

The project is associated with Aleph Alpha and researchers from Heidelberg University. Further details on the latest models and services can be found on the Aleph Alpha website.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The freely available model is a demo; for advanced capabilities, users are directed to Aleph Alpha's commercial offerings. Training MAGMA from scratch requires manually downloading and integrating GPT-J weights.

Health Check
Last Commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.