guppylm  by arman-bd

Accessible LLM training and inference

Created 1 week ago

New!

2,319 stars

Top 19.2% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GuppyLM is a ~9M parameter language model designed to demystify the end-to-end process of training an LLM from scratch. It targets engineers and researchers who want to understand LLM mechanics without requiring extensive resources or expertise. The project provides a fully runnable example, from data generation to inference, enabling users to build their own LLM in minutes, fostering practical learning and demystifying "black box" models.

How It Works

The project employs a deliberately simple, vanilla transformer architecture with 8.7 million parameters, eschewing advanced optimizations like GQA, RoPE, or SwiGLU for clarity and ease of implementation. It utilizes a BPE tokenizer with a 4,096-token vocabulary and a 128-token maximum sequence length. Training is performed on 60,000 synthetic conversations generated via template composition across 60 distinct topics, ensuring a consistent, fish-like personality. This minimalist approach prioritizes educational value and rapid iteration over raw performance.

Quick Start & Requirements

  • Browser Demo: No install needed; runs locally via WebAssembly with a quantized ONNX model (~10 MB). Link
  • Colab: Pre-trained model chat and full training pipeline available. Notebook Generator
  • Local CLI: pip install torch tokenizers, then python -m guppylm chat.
  • Training: Requires a T4 GPU (available in Colab) and approximately 5 minutes.
  • Dataset: arman-bd/guppylm-60k-generic

Highlighted Details

  • Minimalist 8.7M parameter vanilla transformer architecture.
  • End-to-end training demonstrated in a single Colab notebook.
  • Browser-based inference via WebAssembly and quantized ONNX.
  • Synthetic data generation pipeline for consistent personality.

Maintenance & Community

The provided README does not detail specific maintenance contributors, community channels (like Discord/Slack), or a public roadmap.

Licensing & Compatibility

Released under the MIT license, permitting commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The model's 128-token context window severely limits multi-turn conversation quality, leading to degradation after a few exchanges. Its personality is hardcoded into the weights rather than being controllable via system prompts, and it does not comprehend complex human abstractions.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
0
Star History
2,331 stars in the last 13 days

Explore Similar Projects

Starred by Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
19 more.

llm-course by mlabonne

0.5%
78k
LLM course with roadmaps and notebooks
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.