diffusion-gpt by ash80

Character-level text generation using discrete diffusion

Created 8 months ago

257 stars

Top 98.3% on SourcePulse

Project Summary

This repository provides an annotated Jupyter Notebook implementing a character-level discrete diffusion model for text generation, adapting Andrej Karpathy's baby GPT. It targets researchers and practitioners interested in exploring diffusion models as an alternative to autoregressive language models, offering parallel token generation and an educational resource.

How It Works

The project adapts a 7.23M parameter character-level GPT architecture to a discrete diffusion framework. It learns to denoise corrupted text sequences by modeling the score function of the data distribution, employing a score-entropy-based objective. The Discrete Tweedie Sampler facilitates efficient parallel inference, presenting a novel approach to text generation by inverting a token-flipping noising process, distinct from traditional token-by-token autoregressive methods.

Quick Start & Requirements

The primary method of usage is running the provided Jupyter Notebook, either within Google Colab or a local Jupyter instance. Users can optionally modify the dataset, noise schedule, or model size for experimentation. No specific hardware prerequisites like GPUs are mentioned, but a standard Python environment capable of running Jupyter notebooks is assumed. Links to the original nanoGPT and the relevant research paper are provided for deeper context.

Highlighted Details

Features a single, self-contained Jupyter Notebook for theory and implementation.
Implements a character-level discrete diffusion model for text generation.
Adapts Andrej Karpathy's 7.23M parameter character-level baby GPT.
Generates text by denoising all tokens in parallel, a departure from autoregressive models.
Covers the mathematical framework, score-entropy objective, and Discrete Tweedie Sampler for inference.
Demonstrates training on Shakespeare's text.

Maintenance & Community

The project is authored by Ashwani Kumar. No specific community channels (e.g., Discord, Slack), roadmap, or sponsorship information are detailed in the provided README.

Licensing & Compatibility

The README does not explicitly state the software license. This omission requires users to investigate further for compatibility, especially for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is presented as an educational guide and research starting point, rather than a production-ready system. Its focus is on character-level generation, and it relies on a Jupyter Notebook environment, which may not be suitable for all deployment scenarios.

diffusion-gpt by ash80

Explore Similar Projects

diffusion-nlp-paper-arxiv by bansky-cl

awesome-discrete-diffusion-models by kuleshov-group

minimal-text-diffusion by madaan

Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

MinerU-Diffusion by opendatalab

Awesome-DLMs by VILA-Lab

ELF by lillian039

tiny-diffusion by nathan-barry

mdlm by kuleshov-group

KoGPT2 by SKT-AI

ProphetNet by microsoft

nlp-journey by msgi