magma  by Aleph-Alpha-Research

Multimodal model for augmenting generative language models

Created 3 years ago
489 stars

Top 63.1% on SourcePulse

GitHubView on GitHub
Project Summary

MAGMA is a multimodal model that enables generative language models to understand and process combinations of images and text. It is designed for researchers and developers working on vision-language tasks, offering a simplified approach to multimodal integration and achieving state-of-the-art results on benchmarks like OKVQA.

How It Works

MAGMA employs adapter-based finetuning to augment existing generative language models with visual understanding. Crucially, it keeps the core language model weights frozen, preserving its pre-trained knowledge and in-context learning abilities. This adapter-only approach simplifies the pre-training process, allowing for end-to-end training with a single language modeling objective, making it more efficient than multi-step methods.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/Aleph-Alpha/magma.git
  • Requires PyTorch (>= 1.9.0) and Torchvision.
  • Download configuration: wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/master/configs/MAGMA_v1.yml
  • Official documentation and examples are available on the GitHub repository.

Highlighted Details

  • Outperforms Frozen on open-ended generative tasks.
  • Achieves state-of-the-art on the OKVQA benchmark.
  • Pretrained on significantly fewer samples compared to SimVLM.
  • Supports arbitrary combinations of visual and textual input for autoregressive text generation.

Maintenance & Community

The project is associated with Aleph Alpha and researchers from Heidelberg University. Further details on the latest models and services can be found on the Aleph Alpha website.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The freely available model is a demo; for advanced capabilities, users are directed to Aleph Alpha's commercial offerings. Training MAGMA from scratch requires manually downloading and integrating GPT-J weights.

Health Check
Last Commit

1 month ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

fromage by kohjingyu

0%
482
Multimodal model for grounding language models to images
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.