hlb-gpt  by tysam-code

Researcher's toolbench for GPT model exploration

created 2 years ago
349 stars

Top 80.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides hlb-gpt, a minimalistic and performant toolbench for researchers to rapidly prototype and experiment with GPT models. It aims to minimize time-to-result for LLM research, offering a well-documented and simple codebase with sensible defaults, suitable for both experienced researchers and newcomers to ML.

How It Works

hlb-gpt employs a novel LatentAttention block that fuses attention and MLP layers for efficiency. It utilizes learnable linear position embeddings for dynamic attention length and a dynamic microbatch scheduler based on expected gradient norms. This approach prioritizes speed and simplicity, enabling rapid iteration on LLM ideas.

Quick Start & Requirements

  • Install: git clone https://github.com/tysam-code/hlb-gpt && cd hlb-gpt && python -m pip install -r requirements.txt && python main.py
  • Prerequisites: Requires a 40GB A100 GPU. Developed and tested in Google Colab.
  • Resources: Achieves <3.8 validation loss on WikiText-103 in <100 seconds on a single A100.

Highlighted Details

  • Achieves ~3.80 validation loss on WikiText-103 in ~100 seconds on a single A100.
  • Scales from 46M to 3B parameters with a single parameter change (scaling feature in alpha).
  • Implements LatentAttention, learnable linear position embeddings, and dynamic microbatch scheduling.
  • Codebase is over 300 lines, inspired by nanoGPT but evolved significantly.

Maintenance & Community

  • Primarily maintained by tysam-code (Fern).
  • Contact available via Twitter DMs or email (hire.tysam@gmail.com) for consulting/contract work.
  • Support is self-funded and via Patreon.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for research; commercial use implications are unclear without a specified license.

Limitations & Caveats

The model scaling feature is currently in alpha, with large model hyperparameters requiring tuning. The codebase assumes a 40GB A100 GPU, with broader memory support pending.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai), and
10 more.

TinyLlama by jzhang38

0.3%
9k
Tiny pretraining project for a 1.1B Llama model
created 1 year ago
updated 1 year ago
Feedback? Help us improve.