CM3Leon  by kyegomez

Open-source implementation of a multimodal AI research paper

created 2 years ago
362 stars

Top 78.7% on sourcepulse

GitHubView on GitHub
Project Summary

CM3Leon is an open-source implementation of a multimodal AI model capable of generating both text and images autoregressively. It targets researchers and developers working on advanced generative models, offering a unified decoder for diverse content creation. The project aims to provide a state-of-the-art, computationally efficient alternative for multimodal generation tasks.

How It Works

CM3Leon employs a decoder-only transformer architecture, similar to GPT models, but extended for multimodal inputs. It utilizes a two-stage training process: retrieval-augmented pretraining on a large, diverse dataset and supervised fine-tuning on specific text-image tasks. A key innovation is contrastive decoding, which enhances the quality and coherence of generated samples by balancing conditional and unconditional generation streams.

Quick Start & Requirements

  • Install: pip3 install cm3
  • Requirements: PyTorch environment, significant GPU/TPU resources for training, large multimodal datasets (e.g., Shutterstock), custom tokenizer implementation, retrieval infrastructure, and fine-tuning frameworks.
  • Links: PAPER LINK

Highlighted Details

  • Achieves state-of-the-art text-to-image generation, outperforming comparable models with 5x less computational resources.
  • Employs retrieval augmented pretraining and contrastive decoding for improved sample quality.
  • Supports model sizes ranging from 350M to 7B parameters.
  • Uses custom tokenizers for text (CommonCrawl) and images (256x256 encoded into 1024 tokens).

Maintenance & Community

  • The project is marked as "wip" (work in progress) and contributions are welcomed via pull requests and issues.
  • Support is available through the GitHub issue tracker.

Licensing & Compatibility

  • Licensed under the MIT license.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The repository is explicitly stated as "not finished" (wip). Replicating the model requires substantial expertise in distributed training, data pipelines, and optimization techniques, along with significant computational infrastructure.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Feedback? Help us improve.