femtoGPT by keyvank

Rust library for minimal Generative Pretrained Transformer (GPT) models

Created 2 years ago

934 stars

Top 39.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer (GPT) model, designed for both training and inference. It caters to developers and researchers interested in understanding LLM internals, offering a from-scratch approach that avoids external ML frameworks. The project provides a foundational understanding of GPT architecture and implementation details.

How It Works

This project implements a GPT architecture from scratch in Rust, including tensor processing and training/inference logic. It relies on minimal dependencies: rand for random generation, serde/bincode for model serialization, and rayon for parallel computing. This approach allows for a deep dive into LLM mechanics without the overhead of larger frameworks.

Quick Start & Requirements

Install: Rust toolchain required.
GPU: OpenCL runtimes (NVIDIA/AMD compatible) are needed for GPU acceleration. Install ocl-icd-opencl-dev on Debian systems.
Training Data: Place text data in dataset.txt.
Run: cargo run --release -- train or cargo run --release -- infer. Add --features gpu for GPU support.
Docs: The Super Programmer (book in progress).

Highlighted Details

Pure Rust implementation of GPT for training and inference.
Supports CPU and GPU (via OpenCL) acceleration.
Minimal dependencies, focusing on core LLM components.
Includes gradient checking for correctness.

Maintenance & Community

Active development with regular updates and sample outputs.
Discord server available for discussions.

Licensing & Compatibility

MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is described as a "minimal" implementation, and early outputs show limitations in coherence and grammar, though improvements are noted with larger models and training time. Correctness of all layers is not guaranteed.

Health Check

Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days