hnet  by goombalab

Hierarchical sequence modeling with dynamic chunking

Created 3 months ago
750 stars

Top 46.3% on SourcePulse

GitHubView on GitHub
Project Summary

H-Net introduces a novel hierarchical sequence modeling architecture designed for efficient processing of long sequences. Targeting researchers and practitioners in natural language processing and sequence modeling, it offers a dynamic chunking mechanism to improve performance and scalability over traditional methods.

How It Works

H-Net employs a dynamic chunking mechanism that recursively breaks down sequences into smaller, manageable chunks. This hierarchical approach allows the model to capture dependencies at multiple granularities, leading to more effective modeling of long-range relationships. The architecture is built using modular components, including dynamic chunking modules and isotropic (non-hierarchical) components, providing flexibility in design.

Quick Start & Requirements

  • Install via pip: pip install -e . after cloning the repository.
  • PyTorch version >= 2.5.1 is required.
  • Building mamba_ssm from source is strongly recommended: clone state-spaces/mamba, cd mamba, and pip install ..
  • Pretrained models are available on Hugging Face.

Highlighted Details

  • Offers 1-stage and 2-stage hierarchical models (e.g., hnet_1stage_L, hnet_2stage_XL).
  • Models are trained on a 100B-token subset of FineWeb-Edu, with compute matching GPT-3 Large/XL.
  • Provides weights for Chinese and Code tasks, trained on 46B-token subsets.
  • Includes generate.py for text generation with pretrained checkpoints.

Maintenance & Community

The project is associated with goombalab and authors Sukjun Hwang, Brandon Wang, and Albert Gu. Further details and model specifics are available in the linked paper and configuration files.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known issues. Users should consult the associated paper for a comprehensive understanding of the model's capabilities and constraints.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.