slowrun  by qlabs-eng

LLM training benchmark prioritizing deep learning over speed

Created 2 weeks ago

New!

286 stars

Top 91.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary qlabs-eng/slowrun redefines language model benchmarking by prioritizing learning efficiency over speed in a fixed-data, unlimited-compute setting. It targets researchers and engineers exploring advanced algorithms that benefit from extensive computation and regularization, enabling breakthroughs in generalization by removing time constraints.

How It Works This benchmark trains models on a fixed 100M token FineWeb dataset, allowing for maximum learning time. It contrasts with speed-focused benchmarks by enabling computationally intensive methods like large models (e.g., 2.7B parameters) with heavy regularization (e.g., high weight decay, dropout). This approach unlocks algorithmic avenues, such as evolutionary search, that are infeasible under strict time limits, aiming for deeper learning and superior generalization.

Quick Start & Requirements Clone the repo (git clone https://github.com/qlabs-eng/slowrun.git), install dependencies (pip install -r requirements.txt), and prepare data (python prepare_data.py). Competitive runs require significant hardware, typically an 8xH100 node, with training times varying from ~47 minutes (baseline) to many hours for advanced entries.

Highlighted Details The project features three tracks: Limited Compute (1 hour, 8xH100), Tiny Compute (15 minutes, 8xH100), and Unlimited Compute. The baseline achieves a 3.402 validation loss. Records show improvements via architectural changes (U-Net, attention gating), optimized training, and advanced regularization. Submissions are via pull requests.

Maintenance & Community Active contributors are listed, and submissions are managed through pull requests. No specific community channels (e.g., Discord, Slack) are detailed.

Licensing & Compatibility The README does not specify the software license, leaving terms of use, distribution, and modification unclear. This omission is a critical adoption blocker, preventing assessment of commercial use compatibility.

Limitations & Caveats The benchmark's focus is the "infinite compute, fixed data" regime, potentially limiting direct applicability to other scenarios. High-end hardware (8xH100) is a significant barrier. The scaling of winning techniques to larger datasets remains an open question. The unspecified license is a major adoption impediment.

Health Check
Last Commit

15 hours ago

Responsiveness

Inactive

Pull Requests (30d)
27
Issues (30d)
7
Star History
286 stars in the last 15 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0.1%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 3 years ago
Updated 1 year ago
Starred by Benjamin Bolte Benjamin Bolte(Cofounder of K-Scale Labs), Albert Gu Albert Gu(Cofounder of Cartesia; Professor at CMU), and
2 more.

Muon by KellerJordan

0.8%
2k
Optimizer for neural network hidden layers
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.