magma by Aleph-Alpha-Research

Multimodal model for augmenting generative language models

Created 4 years ago

489 stars

Top 63.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Jesse Clark

Cofounder of Marqo

Vincent Weisser

Cofounder of Prime Intellect

Travis Fischer

Founder of Agentic

Stella Rose Biderman

Executive Director at EleutherAI

and 1 more!

Project Summary

MAGMA is a multimodal model that enables generative language models to understand and process combinations of images and text. It is designed for researchers and developers working on vision-language tasks, offering a simplified approach to multimodal integration and achieving state-of-the-art results on benchmarks like OKVQA.

How It Works

MAGMA employs adapter-based finetuning to augment existing generative language models with visual understanding. Crucially, it keeps the core language model weights frozen, preserving its pre-trained knowledge and in-context learning abilities. This adapter-only approach simplifies the pre-training process, allowing for end-to-end training with a single language modeling objective, making it more efficient than multi-step methods.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/Aleph-Alpha/magma.git
Requires PyTorch (>= 1.9.0) and Torchvision.
Download configuration: wget -O configs/MAGMA_v1.yml https://raw.githubusercontent.com/Aleph-Alpha/magma/master/configs/MAGMA_v1.yml
Official documentation and examples are available on the GitHub repository.

Highlighted Details

Outperforms Frozen on open-ended generative tasks.
Achieves state-of-the-art on the OKVQA benchmark.
Pretrained on significantly fewer samples compared to SimVLM.
Supports arbitrary combinations of visual and textual input for autoregressive text generation.

Maintenance & Community

The project is associated with Aleph Alpha and researchers from Heidelberg University. Further details on the latest models and services can be found on the Aleph Alpha website.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The freely available model is a demo; for advanced capabilities, users are directed to Aleph Alpha's commercial offerings. Training MAGMA from scratch requires manually downloading and integrating GPT-J weights.

Health Check

Last Commit

5 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days