tiny-diffusion  by nathan-barry

Small diffusion model for character-level text generation

Created 1 month ago
588 stars

Top 55.2% on SourcePulse

GitHubView on GitHub
Project Summary

A character-level language diffusion model for text generation, tiny-diffusion offers a compact, 10.7 million parameter implementation. It makes diffusion models accessible for local experimentation by engineers and researchers, enabling exploration of text generation without substantial computational resources.

How It Works

The model is a modified version of the nanochat GPT architecture, adapted for character-level diffusion. It processes text sequences up to 256 characters long. This approach leverages diffusion principles, typically used in image generation, for generative tasks within a significantly reduced parameter footprint. This makes it distinct from larger, more resource-intensive language models and enables local experimentation.

Quick Start & Requirements

  • Installation: Clone the repository and run uv sync (requires Python 3.10+).
  • Running Generation: Execute uv run sample.py to generate text using pre-trained weights.
  • Training: Retrain the model from scratch using uv run training.py.
  • Visualization: Explore the denoising process with uv run animations/diffusion-process.py.
  • Dependencies: Python 3.10+.

Highlighted Details

  • Parameters: 10.7 million.
  • Architecture: 6 Layers, 6 Attention Heads, 384 Embedding Dimension.
  • Sequence Length: 256 characters.
  • Diffusion Steps: 128.
  • Training Data: Tiny Shakespeare dataset.
  • Pre-trained Weights: Provided (weights/diffusion_model.pt).

Maintenance & Community

No specific details regarding contributors, community channels, or roadmap were provided in the README snippet.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license permits broad use, including commercial applications, without significant restrictions.

Limitations & Caveats

The model operates strictly at a character level, which may impact the coherence, grammatical correctness, and linguistic nuance compared to token-based or word-based language models. It is trained exclusively on the "Tiny Shakespeare" dataset, inherently limiting its generative domain to the style and vocabulary present in that specific corpus. The current text generation context length is capped at 30 characters, potentially restricting the flow of longer outputs.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
485 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.