femtoGPT  by keyvank

Rust library for minimal Generative Pretrained Transformer (GPT) models

created 2 years ago
907 stars

Top 40.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer (GPT) model, designed for both training and inference. It caters to developers and researchers interested in understanding LLM internals, offering a from-scratch approach that avoids external ML frameworks. The project provides a foundational understanding of GPT architecture and implementation details.

How It Works

This project implements a GPT architecture from scratch in Rust, including tensor processing and training/inference logic. It relies on minimal dependencies: rand for random generation, serde/bincode for model serialization, and rayon for parallel computing. This approach allows for a deep dive into LLM mechanics without the overhead of larger frameworks.

Quick Start & Requirements

  • Install: Rust toolchain required.
  • GPU: OpenCL runtimes (NVIDIA/AMD compatible) are needed for GPU acceleration. Install ocl-icd-opencl-dev on Debian systems.
  • Training Data: Place text data in dataset.txt.
  • Run: cargo run --release -- train or cargo run --release -- infer. Add --features gpu for GPU support.
  • Docs: The Super Programmer (book in progress).

Highlighted Details

  • Pure Rust implementation of GPT for training and inference.
  • Supports CPU and GPU (via OpenCL) acceleration.
  • Minimal dependencies, focusing on core LLM components.
  • Includes gradient checking for correctness.

Maintenance & Community

  • Active development with regular updates and sample outputs.
  • Discord server available for discussions.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is described as a "minimal" implementation, and early outputs show limitations in coherence and grammar, though improvements are noted with larger models and training time. Correctness of all layers is not guaranteed.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
40 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
7 more.

ThunderKittens by HazyResearch

0.6%
3k
CUDA kernel framework for fast deep learning primitives
created 1 year ago
updated 3 days ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
6 more.

gpt-neox by EleutherAI

0.1%
7k
Framework for training large-scale autoregressive language models
created 4 years ago
updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), Anton Bukov Anton Bukov(Cofounder of 1inch Network), and
16 more.

tinygrad by tinygrad

0.1%
30k
Minimalist deep learning framework for education and exploration
created 4 years ago
updated 19 hours ago
Feedback? Help us improve.