hlb-gpt by tysam-code

Researcher's toolbench for GPT model exploration

Created 2 years ago

355 stars

Top 78.8% on SourcePulse

View on GitHub

3 Experts Love This Project

George Hotz

Author of tinygrad; Founder of the tiny corp, comma.ai

Daniel Han

Cofounder of Unsloth

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides hlb-gpt, a minimalistic and performant toolbench for researchers to rapidly prototype and experiment with GPT models. It aims to minimize time-to-result for LLM research, offering a well-documented and simple codebase with sensible defaults, suitable for both experienced researchers and newcomers to ML.

How It Works

hlb-gpt employs a novel LatentAttention block that fuses attention and MLP layers for efficiency. It utilizes learnable linear position embeddings for dynamic attention length and a dynamic microbatch scheduler based on expected gradient norms. This approach prioritizes speed and simplicity, enabling rapid iteration on LLM ideas.

Quick Start & Requirements

Install: git clone https://github.com/tysam-code/hlb-gpt && cd hlb-gpt && python -m pip install -r requirements.txt && python main.py
Prerequisites: Requires a 40GB A100 GPU. Developed and tested in Google Colab.
Resources: Achieves <3.8 validation loss on WikiText-103 in <100 seconds on a single A100.

Highlighted Details

Achieves ~3.80 validation loss on WikiText-103 in ~100 seconds on a single A100.
Scales from 46M to 3B parameters with a single parameter change (scaling feature in alpha).
Implements LatentAttention, learnable linear position embeddings, and dynamic microbatch scheduling.
Codebase is over 300 lines, inspired by nanoGPT but evolved significantly.

Maintenance & Community

Primarily maintained by tysam-code (Fern).
Contact available via Twitter DMs or email (hire.tysam@gmail.com) for consulting/contract work.
Support is self-funded and via Patreon.

Licensing & Compatibility

License: Not explicitly stated in the README.
Compatibility: Designed for research; commercial use implications are unclear without a specified license.

Limitations & Caveats

The model scaling feature is currently in alpha, with large model hyperparameters requiring tuning. The codebase assumes a 40GB A100 GPU, with broader memory support pending.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days