diffusion-gpt  by ash80

Character-level text generation using discrete diffusion

Created 5 months ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an annotated Jupyter Notebook implementing a character-level discrete diffusion model for text generation, adapting Andrej Karpathy's baby GPT. It targets researchers and practitioners interested in exploring diffusion models as an alternative to autoregressive language models, offering parallel token generation and an educational resource.

How It Works

The project adapts a 7.23M parameter character-level GPT architecture to a discrete diffusion framework. It learns to denoise corrupted text sequences by modeling the score function of the data distribution, employing a score-entropy-based objective. The Discrete Tweedie Sampler facilitates efficient parallel inference, presenting a novel approach to text generation by inverting a token-flipping noising process, distinct from traditional token-by-token autoregressive methods.

Quick Start & Requirements

The primary method of usage is running the provided Jupyter Notebook, either within Google Colab or a local Jupyter instance. Users can optionally modify the dataset, noise schedule, or model size for experimentation. No specific hardware prerequisites like GPUs are mentioned, but a standard Python environment capable of running Jupyter notebooks is assumed. Links to the original nanoGPT and the relevant research paper are provided for deeper context.

Highlighted Details

  • Features a single, self-contained Jupyter Notebook for theory and implementation.
  • Implements a character-level discrete diffusion model for text generation.
  • Adapts Andrej Karpathy's 7.23M parameter character-level baby GPT.
  • Generates text by denoising all tokens in parallel, a departure from autoregressive models.
  • Covers the mathematical framework, score-entropy objective, and Discrete Tweedie Sampler for inference.
  • Demonstrates training on Shakespeare's text.

Maintenance & Community

The project is authored by Ashwani Kumar. No specific community channels (e.g., Discord, Slack), roadmap, or sponsorship information are detailed in the provided README.

Licensing & Compatibility

The README does not explicitly state the software license. This omission requires users to investigate further for compatibility, especially for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is presented as an educational guide and research starting point, rather than a production-ready system. Its focus is on character-level generation, and it relies on a Jupyter Notebook environment, which may not be suitable for all deployment scenarios.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.