LLMs-from-scratch  by rasbt

Educational resource for LLM construction in PyTorch

created 2 years ago
60,573 stars

Top 0.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the complete PyTorch code for building, pretraining, and finetuning a GPT-like Large Language Model (LLM) from scratch, mirroring techniques used in models like ChatGPT. It's an educational resource, primarily for developers and researchers aiming to understand LLM internals through hands-on implementation, as detailed in the accompanying Manning book.

How It Works

The project guides users through implementing core LLM components, including text tokenization (BPE), attention mechanisms, and the GPT architecture itself. It then covers pretraining on unlabeled data and finetuning for specific tasks like text classification and instruction following, using a step-by-step, code-centric approach. This method demystifies LLM development by breaking down complex concepts into manageable, executable code segments.

Quick Start & Requirements

  • Install: git clone --depth 1 https://github.com/rasbt/LLMs-from-scratch.git
  • Prerequisites: Python, PyTorch. GPU is automatically utilized if available. Refer to the setup directory for detailed environment setup.
  • Resources: Designed to run on conventional laptops.

Highlighted Details

  • Comprehensive coverage from foundational concepts to advanced finetuning techniques (e.g., LoRA).
  • Includes code for loading and finetuning larger pretrained models.
  • Bonus materials offer deeper dives into specific topics like BPE implementations and performance optimization.
  • Demonstrates building user interfaces for interacting with trained models.

Maintenance & Community

The repository is associated with the book "Build a Large Language Model (From Scratch)" by Sebastian Raschka. Feedback and questions are welcomed via the Manning Forum or GitHub Discussions. Contributions to the main chapter code are not accepted to maintain consistency with the book.

Licensing & Compatibility

The repository code is typically provided under a permissive license (e.g., MIT, Apache 2.0) allowing for commercial use and integration into closed-source projects, consistent with typical open-source educational materials. The book itself is copyrighted.

Limitations & Caveats

The primary focus is educational, demonstrating LLM principles with smaller, functional models. While it covers finetuning larger models, the core implementation is geared towards understanding rather than achieving state-of-the-art performance on massive datasets without significant adaptation.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
22
Issues (30d)
9
Star History
13,766 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

cookbook by EleutherAI

0.1%
809
Deep learning resource for practical model work
created 1 year ago
updated 4 days ago
Feedback? Help us improve.