HRM-Text  by sapientinc

Efficient text generation model pretraining framework

Created 1 week ago

New!

776 stars

Top 44.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

HRM-Text offers an accessible framework for pretraining foundation models from scratch, drastically reducing compute and data needs. Targeting researchers and engineers, it enables LLM development with costs as low as ~$1000, democratizing access to large-scale model creation.

How It Works

This project employs a Hierarchical Recurrent Architecture (HRM) combined with PrefixLM sequence packing and FlashAttention 3 kernels for efficient processing. Training utilizes PyTorch's FSDP2 for optimized distributed computation. This approach achieves pretraining with orders of magnitude less compute and data than traditional scaling methods.

Quick Start & Requirements

  • Installation: Docker is recommended (docker run --gpus all --ipc=host --network=host -it -v "$PWD":/workspace sapientai/hrm-text:latest). Alternatively, pip install -r requirements.txt after setting up PyTorch, CUDA, and FlashAttention 3.
  • Prerequisites: Hopper-class GPUs are expected due to FlashAttention 3. CUDA, PyTorch, and FlashAttention 3 are required. Multi-node setups need NCCL verification. Weights & Biases account for tracking.
  • Resource Footprint: Pretraining a 0.6B model (Size L) requires 8 H100 GPUs for 50 hours ($800). A 1B model (Size XL) needs 16 H100s for 46 hours ($1472). Evaluation typically uses one 80GB GPU.
  • Links: Docker image: sapientai/hrm-text:latest. Paper: https://arxiv.org/abs/2605.20613.

Highlighted Details

  • Demonstrates strong benchmark performance, e.g., 84.7% GSM8k and 60.7% MMLU for the 1B XL model.
  • Provides a complete pretraining framework with tooling for data preparation, training, evaluation, and checkpoint conversion.
  • Optimized for efficient distributed training using PyTorch FSDP2 and FlashAttention 3.

Maintenance & Community

Active development is evident, with ongoing work on native Transformers and vLLM support. No specific community channels or roadmap links are detailed in the provided text.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Reliance on FlashAttention 3 necessitates Hopper-class GPUs. Native integration with Transformers and vLLM is pending, requiring conversion steps. Data preparation depends on a companion data_io pipeline.

Health Check
Last Commit

9 hours ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
6
Star History
777 stars in the last 9 days

Explore Similar Projects

Starred by Wing Lian Wing Lian(Founder of Axolotl AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

fms-fsdp by foundation-model-stack

0%
286
Efficiently train foundation models with PyTorch
Created 2 years ago
Updated 6 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Tri Dao Tri Dao(Chief Scientist at Together AI), and
1 more.

hnet by goombalab

0.1%
852
Hierarchical sequence modeling with dynamic chunking
Created 10 months ago
Updated 6 months ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
25 more.

gpt-neox by EleutherAI

0.0%
7k
Framework for training large-scale autoregressive language models
Created 5 years ago
Updated 1 week ago
Feedback? Help us improve.