Discover and explore top open-source AI tools and projects—updated daily.
qlabs-engLLM training benchmark prioritizing deep learning over speed
New!
Top 91.9% on SourcePulse
Summary
qlabs-eng/slowrun redefines language model benchmarking by prioritizing learning efficiency over speed in a fixed-data, unlimited-compute setting. It targets researchers and engineers exploring advanced algorithms that benefit from extensive computation and regularization, enabling breakthroughs in generalization by removing time constraints.
How It Works This benchmark trains models on a fixed 100M token FineWeb dataset, allowing for maximum learning time. It contrasts with speed-focused benchmarks by enabling computationally intensive methods like large models (e.g., 2.7B parameters) with heavy regularization (e.g., high weight decay, dropout). This approach unlocks algorithmic avenues, such as evolutionary search, that are infeasible under strict time limits, aiming for deeper learning and superior generalization.
Quick Start & Requirements
Clone the repo (git clone https://github.com/qlabs-eng/slowrun.git), install dependencies (pip install -r requirements.txt), and prepare data (python prepare_data.py). Competitive runs require significant hardware, typically an 8xH100 node, with training times varying from ~47 minutes (baseline) to many hours for advanced entries.
Highlighted Details The project features three tracks: Limited Compute (1 hour, 8xH100), Tiny Compute (15 minutes, 8xH100), and Unlimited Compute. The baseline achieves a 3.402 validation loss. Records show improvements via architectural changes (U-Net, attention gating), optimized training, and advanced regularization. Submissions are via pull requests.
Maintenance & Community Active contributors are listed, and submissions are managed through pull requests. No specific community channels (e.g., Discord, Slack) are detailed.
Licensing & Compatibility The README does not specify the software license, leaving terms of use, distribution, and modification unclear. This omission is a critical adoption blocker, preventing assessment of commercial use compatibility.
Limitations & Caveats The benchmark's focus is the "infinite compute, fixed data" regime, potentially limiting direct applicability to other scenarios. High-end hardware (8xH100) is a significant barrier. The scaling of winning techniques to larger datasets remains an open question. The unspecified license is a major adoption impediment.
15 hours ago
Inactive
microsoft
PiotrNawrot
KellerJordan
jiaweizzhao