AutoGrad-Engine  by milanm

Pure C# GPT implementation for learning

Created 1 week ago

New!

317 stars

Top 85.6% on SourcePulse

GitHubView on GitHub
Project Summary

A complete GPT language model implementation in pure C#, covering both training and inference, with zero external dependencies. It serves as an educational tool for developers and researchers aiming to understand the core algorithms behind models like ChatGPT, offering a faithful port of Andrej Karpathy's microgpt.py. The primary benefit is demystifying complex ML concepts through a self-contained, understandable codebase.

How It Works

The project implements a character-level GPT model using a custom autograd engine (Value class) for automatic differentiation via backpropagation. It employs a modern transformer architecture featuring a pre-norm design with RMSNorm, multi-head attention (Q·K/√d), and a feed-forward MLP utilizing squared ReLU activation. Weight tying is used between input token embeddings and output projections. Unlike production systems, it processes data serially (one number at a time) to illustrate fundamental concepts, making it an accessible learning resource.

Quick Start & Requirements

  • Installation: Navigate to src/AutogradEngine and run dotnet run.
  • Prerequisites: .NET SDK.
  • Resources: Training completes in minutes on a CPU for the example dataset.
  • Further Reading: The README references Karpathy's original microgpt.py, Karpathy's micrograd, Karpathy's "Let's build GPT from scratch" video lecture, the "Attention Is All You Need" paper, the GPT-2 Paper, and "The Illustrated Transformer".

Highlighted Details

  • Zero Dependencies: A complete GPT implementation without requiring PyTorch, TensorFlow, or any NuGet packages.
  • Autograd Engine: Built-in automatic differentiation system tracks computation graphs and computes gradients using the chain rule.
  • Modern Transformer Choices: Employs RMSNorm, squared ReLU activation, and a pre-norm architecture, differing from earlier GPT versions.
  • Educational Design: Processes data serially to clearly demonstrate every conceptual piece of a GPT, contrasting with the massive parallel tensor operations of production models.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are present in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for educational purposes, modification, and sharing. Generally compatible with commercial use.

Limitations & Caveats

This project is explicitly an educational tool, not production-ready code. Its scalar, single-number processing approach makes it orders of magnitude slower and less scalable than industrial GPT implementations. The example trains a very small model on a limited dataset with a small context window.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
320 stars in the last 13 days

Explore Similar Projects

Starred by Maxime Labonne Maxime Labonne(Head of Post-Training at Liquid AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
1 more.

ML-Notebooks by dair-ai

0.1%
3k
ML notebooks for education/research
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.