Pretraining code for depth-recurrent language model research
Top 44.7% on sourcepulse
This repository provides the pretraining code for a large-scale depth-recurrent language model, specifically the huginn-0125
model. It is targeted at researchers and engineers interested in replicating or understanding the training process of such models, particularly on large-scale AMD GPU clusters, offering insights into overcoming hardware-specific challenges.
How It Works
The project implements a recurrent neural network architecture designed for large-scale language model pretraining. It leverages a custom parallelism implementation (SimpleFabric
) and an _allreduce_chunk_stream
method for inter-node communication, specifically addressing issues encountered with RCCL hangs on AMD systems. The training process is orchestrated via train.py
, with model definitions and configurations detailed in repre/model_dynamic.py
and launch_configs/
respectively.
Quick Start & Requirements
python train.py --config=launch_configs/your_config.yaml
.litgpt
base), Python, and potentially specific libraries like bpeasy
for tokenizer generation.Highlighted Details
_allreduce_chunk_stream
for inter-node communication to mitigate RCCL hangs.recpre/raven_modeling_minimal.py
).lm-eval
harness and bigcode
for code tasks.Maintenance & Community
The project is authored by a team including Jonas Geiping, John Kirchenbauer, and others from the TomG group at UMD. The authors encourage users to open issues for questions or details.
Licensing & Compatibility
Released under the Apache-2.0 license. Some code is also licensed under the Lightning AI Apache-2.0 license. This license is permissive and generally compatible with commercial use.
Limitations & Caveats
The README explicitly states that this implementation may not be ideal for users wanting to pretrain their own models, suggesting it's more of a reference. The data preparation scripts are noted as not highly scalable, time-consuming, and susceptible to breaking changes in external datasets.
2 weeks ago
1 day