femtoGPT  by keyvank

Rust library for minimal Generative Pretrained Transformer (GPT) models

Created 2 years ago
914 stars

Top 39.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

femtoGPT is a pure Rust implementation of a minimal Generative Pretrained Transformer (GPT) model, designed for both training and inference. It caters to developers and researchers interested in understanding LLM internals, offering a from-scratch approach that avoids external ML frameworks. The project provides a foundational understanding of GPT architecture and implementation details.

How It Works

This project implements a GPT architecture from scratch in Rust, including tensor processing and training/inference logic. It relies on minimal dependencies: rand for random generation, serde/bincode for model serialization, and rayon for parallel computing. This approach allows for a deep dive into LLM mechanics without the overhead of larger frameworks.

Quick Start & Requirements

  • Install: Rust toolchain required.
  • GPU: OpenCL runtimes (NVIDIA/AMD compatible) are needed for GPU acceleration. Install ocl-icd-opencl-dev on Debian systems.
  • Training Data: Place text data in dataset.txt.
  • Run: cargo run --release -- train or cargo run --release -- infer. Add --features gpu for GPU support.
  • Docs: The Super Programmer (book in progress).

Highlighted Details

  • Pure Rust implementation of GPT for training and inference.
  • Supports CPU and GPU (via OpenCL) acceleration.
  • Minimal dependencies, focusing on core LLM components.
  • Includes gradient checking for correctness.

Maintenance & Community

  • Active development with regular updates and sample outputs.
  • Discord server available for discussions.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is described as a "minimal" implementation, and early outputs show limitations in coherence and grammar, though improvements are noted with larger models and training time. Correctness of all layers is not guaranteed.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Bojan Tunguz Bojan Tunguz(AI Scientist; Formerly at NVIDIA), Alex Chen Alex Chen(Cofounder of Nexa AI), and
19 more.

ggml by ggml-org

0.3%
13k
Tensor library for machine learning
Created 3 years ago
Updated 2 days ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
28k
LLM training in pure C/CUDA, no PyTorch needed
Created 1 year ago
Updated 2 months ago
Feedback? Help us improve.