HRM-Text by sapientinc

Efficient text generation model pretraining framework

Created 1 month ago

1,645 stars

Top 24.7% on SourcePulse

View on GitHub

5 Experts Love This Project

Phil Wang

Prolific Research Paper Implementer

Jeff Hammerbacher

Cofounder of Cloudera

Victor Taelin

Author of Bend, Kind, HVM

Wing Lian

Founder of Axolotl AI

and 1 more!

Project Summary

Summary

HRM-Text offers an accessible framework for pretraining foundation models from scratch, drastically reducing compute and data needs. Targeting researchers and engineers, it enables LLM development with costs as low as ~$1000, democratizing access to large-scale model creation.

How It Works

This project employs a Hierarchical Recurrent Architecture (HRM) combined with PrefixLM sequence packing and FlashAttention 3 kernels for efficient processing. Training utilizes PyTorch's FSDP2 for optimized distributed computation. This approach achieves pretraining with orders of magnitude less compute and data than traditional scaling methods.

Quick Start & Requirements

Installation: Docker is recommended (docker run --gpus all --ipc=host --network=host -it -v "$PWD":/workspace sapientai/hrm-text:latest). Alternatively, pip install -r requirements.txt after setting up PyTorch, CUDA, and FlashAttention 3.
Prerequisites: Hopper-class GPUs are expected due to FlashAttention 3. CUDA, PyTorch, and FlashAttention 3 are required. Multi-node setups need NCCL verification. Weights & Biases account for tracking.
Resource Footprint: Pretraining a 0.6B model (Size L) requires 8 H100 GPUs for ~~50 hours (~~$800). A 1B model (Size XL) needs 16 H100s for ~~46 hours (~~$1472). Evaluation typically uses one 80GB GPU.
Links: Docker image: sapientai/hrm-text:latest. Paper: https://arxiv.org/abs/2605.20613.

Highlighted Details

Demonstrates strong benchmark performance, e.g., 84.7% GSM8k and 60.7% MMLU for the 1B XL model.
Provides a complete pretraining framework with tooling for data preparation, training, evaluation, and checkpoint conversion.
Optimized for efficient distributed training using PyTorch FSDP2 and FlashAttention 3.

Maintenance & Community

Active development is evident, with ongoing work on native Transformers and vLLM support. No specific community channels or roadmap links are detailed in the provided text.

Licensing & Compatibility

Released under the Apache License 2.0, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Reliance on FlashAttention 3 necessitates Hopper-class GPUs. Native integration with Transformers and vLLM is pending, requiring conversion steps. Data preparation depends on a companion data_io pipeline.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

456 stars in the last 30 days