hnet by goombalab

Hierarchical sequence modeling with dynamic chunking

Created 4 months ago

787 stars

Top 44.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Tri Dao

Chief Scientist at Together AI

Albert Gu

Cofounder of Cartesia; Professor at CMU

Project Summary

H-Net introduces a novel hierarchical sequence modeling architecture designed for efficient processing of long sequences. Targeting researchers and practitioners in natural language processing and sequence modeling, it offers a dynamic chunking mechanism to improve performance and scalability over traditional methods.

How It Works

H-Net employs a dynamic chunking mechanism that recursively breaks down sequences into smaller, manageable chunks. This hierarchical approach allows the model to capture dependencies at multiple granularities, leading to more effective modeling of long-range relationships. The architecture is built using modular components, including dynamic chunking modules and isotropic (non-hierarchical) components, providing flexibility in design.

Quick Start & Requirements

Install via pip: pip install -e . after cloning the repository.
PyTorch version >= 2.5.1 is required.
Building mamba_ssm from source is strongly recommended: clone state-spaces/mamba, cd mamba, and pip install ..
Pretrained models are available on Hugging Face.

Highlighted Details

Offers 1-stage and 2-stage hierarchical models (e.g., hnet_1stage_L, hnet_2stage_XL).
Models are trained on a 100B-token subset of FineWeb-Edu, with compute matching GPT-3 Large/XL.
Provides weights for Chinese and Code tasks, trained on 46B-token subsets.
Includes generate.py for text generation with pretrained checkpoints.

Maintenance & Community

The project is associated with goombalab and authors Sukjun Hwang, Brandon Wang, and Albert Gu. Further details and model specifics are available in the linked paper and configuration files.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known issues. Users should consult the associated paper for a comprehensive understanding of the model's capabilities and constraints.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

25 stars in the last 30 days