hlb-gpt  by tysam-code

Researcher's toolbench for GPT model exploration

Created 2 years ago
349 stars

Top 79.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides hlb-gpt, a minimalistic and performant toolbench for researchers to rapidly prototype and experiment with GPT models. It aims to minimize time-to-result for LLM research, offering a well-documented and simple codebase with sensible defaults, suitable for both experienced researchers and newcomers to ML.

How It Works

hlb-gpt employs a novel LatentAttention block that fuses attention and MLP layers for efficiency. It utilizes learnable linear position embeddings for dynamic attention length and a dynamic microbatch scheduler based on expected gradient norms. This approach prioritizes speed and simplicity, enabling rapid iteration on LLM ideas.

Quick Start & Requirements

  • Install: git clone https://github.com/tysam-code/hlb-gpt && cd hlb-gpt && python -m pip install -r requirements.txt && python main.py
  • Prerequisites: Requires a 40GB A100 GPU. Developed and tested in Google Colab.
  • Resources: Achieves <3.8 validation loss on WikiText-103 in <100 seconds on a single A100.

Highlighted Details

  • Achieves ~3.80 validation loss on WikiText-103 in ~100 seconds on a single A100.
  • Scales from 46M to 3B parameters with a single parameter change (scaling feature in alpha).
  • Implements LatentAttention, learnable linear position embeddings, and dynamic microbatch scheduling.
  • Codebase is over 300 lines, inspired by nanoGPT but evolved significantly.

Maintenance & Community

  • Primarily maintained by tysam-code (Fern).
  • Contact available via Twitter DMs or email (hire.tysam@gmail.com) for consulting/contract work.
  • Support is self-funded and via Patreon.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Designed for research; commercial use implications are unclear without a specified license.

Limitations & Caveats

The model scaling feature is currently in alpha, with large model hyperparameters requiring tuning. The codebase assumes a 40GB A100 GPU, with broader memory support pending.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
4 more.

Sophia by Liuhong99

0.1%
970
Optimizer for language model pre-training (research paper)
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.