Researcher's toolbench for GPT model exploration
Top 80.8% on sourcepulse
This repository provides hlb-gpt, a minimalistic and performant toolbench for researchers to rapidly prototype and experiment with GPT models. It aims to minimize time-to-result for LLM research, offering a well-documented and simple codebase with sensible defaults, suitable for both experienced researchers and newcomers to ML.
How It Works
hlb-gpt employs a novel LatentAttention block that fuses attention and MLP layers for efficiency. It utilizes learnable linear position embeddings for dynamic attention length and a dynamic microbatch scheduler based on expected gradient norms. This approach prioritizes speed and simplicity, enabling rapid iteration on LLM ideas.
Quick Start & Requirements
git clone https://github.com/tysam-code/hlb-gpt && cd hlb-gpt && python -m pip install -r requirements.txt && python main.py
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The model scaling feature is currently in alpha, with large model hyperparameters requiring tuning. The codebase assumes a 40GB A100 GPU, with broader memory support pending.
1 year ago
1 day