AutoGrad-Engine by milanm

Pure C# GPT implementation for learning

Created 4 months ago

399 stars

Top 72.0% on SourcePulse

Project Summary

A complete GPT language model implementation in pure C#, covering both training and inference, with zero external dependencies. It serves as an educational tool for developers and researchers aiming to understand the core algorithms behind models like ChatGPT, offering a faithful port of Andrej Karpathy's microgpt.py. The primary benefit is demystifying complex ML concepts through a self-contained, understandable codebase.

How It Works

The project implements a character-level GPT model using a custom autograd engine (Value class) for automatic differentiation via backpropagation. It employs a modern transformer architecture featuring a pre-norm design with RMSNorm, multi-head attention (Q·K/√d), and a feed-forward MLP utilizing squared ReLU activation. Weight tying is used between input token embeddings and output projections. Unlike production systems, it processes data serially (one number at a time) to illustrate fundamental concepts, making it an accessible learning resource.

Quick Start & Requirements

Installation: Navigate to src/AutogradEngine and run dotnet run.
Prerequisites: .NET SDK.
Resources: Training completes in minutes on a CPU for the example dataset.
Further Reading: The README references Karpathy's original microgpt.py, Karpathy's micrograd, Karpathy's "Let's build GPT from scratch" video lecture, the "Attention Is All You Need" paper, the GPT-2 Paper, and "The Illustrated Transformer".

Highlighted Details

Zero Dependencies: A complete GPT implementation without requiring PyTorch, TensorFlow, or any NuGet packages.
Autograd Engine: Built-in automatic differentiation system tracks computation graphs and computes gradients using the chain rule.
Modern Transformer Choices: Employs RMSNorm, squared ReLU activation, and a pre-norm architecture, differing from earlier GPT versions.
Educational Design: Processes data serially to clearly demonstrate every conceptual piece of a GPT, contrasting with the massive parallel tensor operations of production models.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap are present in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive license suitable for educational purposes, modification, and sharing. Generally compatible with commercial use.

Limitations & Caveats

This project is explicitly an educational tool, not production-ready code. Its scalar, single-number processing approach makes it orders of magnitude slower and less scalable than industrial GPT implementations. The example trains a very small model on a limited dataset with a small context window.

AutoGrad-Engine by milanm

Explore Similar Projects

all-of-it by Infatoshi

History-of-Deep-Learning by saurabhaloneai

PyTorch-Adventures by priyammaz

dl-course by catalyst-team

naacl_transfer_learning_tutorial by huggingface

Daily-LLM by zkywsg

LearnML by llSourcell

zero_to_gpt by VikParuchuri

llm-from-scratch by angelos-p

Generative-AI-with-LLMs by Ryota-Kawamura

ML-Notebooks by dair-ai

nn-zero-to-hero by karpathy