LLMs-from-scratch by rasbt

Educational resource for LLM construction in PyTorch

Created 2 years ago

82,709 stars

Top 0.2% on SourcePulse

View on GitHub

16 Experts Love This Project

Peter Norvig

Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google

Sebastian Raschka

Author of "Build a Large Language Model (From Scratch)"

Gabriel Almeida

Cofounder of Langflow

Bojan Tunguz

AI Scientist; Formerly at NVIDIA

and 12 more!

Project Summary

This repository provides the complete PyTorch code for building, pretraining, and finetuning a GPT-like Large Language Model (LLM) from scratch, mirroring techniques used in models like ChatGPT. It's an educational resource, primarily for developers and researchers aiming to understand LLM internals through hands-on implementation, as detailed in the accompanying Manning book.

How It Works

The project guides users through implementing core LLM components, including text tokenization (BPE), attention mechanisms, and the GPT architecture itself. It then covers pretraining on unlabeled data and finetuning for specific tasks like text classification and instruction following, using a step-by-step, code-centric approach. This method demystifies LLM development by breaking down complex concepts into manageable, executable code segments.

Quick Start & Requirements

Install: git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
Prerequisites: Python, PyTorch. GPU is automatically utilized if available. Refer to the setup directory for detailed environment setup.
Resources: Designed to run on conventional laptops.

Highlighted Details

Comprehensive coverage from foundational concepts to advanced finetuning techniques (e.g., LoRA).
Includes code for loading and finetuning larger pretrained models.
Bonus materials offer deeper dives into specific topics like BPE implementations and performance optimization.
Demonstrates building user interfaces for interacting with trained models.

Maintenance & Community

The repository is associated with the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka. Feedback and questions are welcomed via the Manning Forum or GitHub Discussions. Contributions to the main chapter code are not accepted to maintain consistency with the book.

Licensing & Compatibility

The repository code is typically provided under a permissive license (e.g., MIT, Apache 2.0) allowing for commercial use and integration into closed-source projects, consistent with typical open-source educational materials. The book itself is copyrighted.

Limitations & Caveats

The primary focus is educational, demonstrating LLM principles with smaller, functional models. While it covers finetuning larger models, the core implementation is geared towards understanding rather than achieving state-of-the-art performance on massive datasets without significant adaptation.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1,995 stars in the last 30 days