tiny-diffusion by nathan-barry

Small diffusion model for character-level text generation

Created 4 months ago

880 stars

Top 40.9% on SourcePulse

View on GitHub

3 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

A character-level language diffusion model for text generation, tiny-diffusion offers a compact, 10.7 million parameter implementation. It makes diffusion models accessible for local experimentation by engineers and researchers, enabling exploration of text generation without substantial computational resources.

How It Works

The model is a modified version of the nanochat GPT architecture, adapted for character-level diffusion. It processes text sequences up to 256 characters long. This approach leverages diffusion principles, typically used in image generation, for generative tasks within a significantly reduced parameter footprint. This makes it distinct from larger, more resource-intensive language models and enables local experimentation.

Quick Start & Requirements

Installation: Clone the repository and run uv sync (requires Python 3.10+).
Running Generation: Execute uv run sample.py to generate text using pre-trained weights.
Training: Retrain the model from scratch using uv run training.py.
Visualization: Explore the denoising process with uv run animations/diffusion-process.py.
Dependencies: Python 3.10+.

Highlighted Details

Parameters: 10.7 million.
Architecture: 6 Layers, 6 Attention Heads, 384 Embedding Dimension.
Sequence Length: 256 characters.
Diffusion Steps: 128.
Training Data: Tiny Shakespeare dataset.
Pre-trained Weights: Provided (weights/diffusion_model.pt).

Maintenance & Community

No specific details regarding contributors, community channels, or roadmap were provided in the README snippet.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license permits broad use, including commercial applications, without significant restrictions.

Limitations & Caveats

The model operates strictly at a character level, which may impact the coherence, grammatical correctness, and linguistic nuance compared to token-based or word-based language models. It is trained exclusively on the "Tiny Shakespeare" dataset, inherently limiting its generative domain to the style and vocabulary present in that specific corpus. The current text generation context length is capped at 30 characters, potentially restricting the flow of longer outputs.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days