Discover and explore top open-source AI tools and projects—updated daily.
ash80Character-level text generation using discrete diffusion
Top 99.6% on SourcePulse
This repository provides an annotated Jupyter Notebook implementing a character-level discrete diffusion model for text generation, adapting Andrej Karpathy's baby GPT. It targets researchers and practitioners interested in exploring diffusion models as an alternative to autoregressive language models, offering parallel token generation and an educational resource.
How It Works
The project adapts a 7.23M parameter character-level GPT architecture to a discrete diffusion framework. It learns to denoise corrupted text sequences by modeling the score function of the data distribution, employing a score-entropy-based objective. The Discrete Tweedie Sampler facilitates efficient parallel inference, presenting a novel approach to text generation by inverting a token-flipping noising process, distinct from traditional token-by-token autoregressive methods.
Quick Start & Requirements
The primary method of usage is running the provided Jupyter Notebook, either within Google Colab or a local Jupyter instance. Users can optionally modify the dataset, noise schedule, or model size for experimentation. No specific hardware prerequisites like GPUs are mentioned, but a standard Python environment capable of running Jupyter notebooks is assumed. Links to the original nanoGPT and the relevant research paper are provided for deeper context.
Highlighted Details
Maintenance & Community
The project is authored by Ashwani Kumar. No specific community channels (e.g., Discord, Slack), roadmap, or sponsorship information are detailed in the provided README.
Licensing & Compatibility
The README does not explicitly state the software license. This omission requires users to investigate further for compatibility, especially for commercial use or integration into closed-source projects.
Limitations & Caveats
The project is presented as an educational guide and research starting point, rather than a production-ready system. Its focus is on character-level generation, and it relies on a Jupyter Notebook environment, which may not be suitable for all deployment scenarios.
5 months ago
Inactive
nathan-barry
kuleshov-group