hnet  by goombalab

Hierarchical sequence modeling with dynamic chunking

created 1 month ago
654 stars

Top 51.0% on SourcePulse

GitHubView on GitHub
Project Summary

H-Net introduces a novel hierarchical sequence modeling architecture designed for efficient processing of long sequences. Targeting researchers and practitioners in natural language processing and sequence modeling, it offers a dynamic chunking mechanism to improve performance and scalability over traditional methods.

How It Works

H-Net employs a dynamic chunking mechanism that recursively breaks down sequences into smaller, manageable chunks. This hierarchical approach allows the model to capture dependencies at multiple granularities, leading to more effective modeling of long-range relationships. The architecture is built using modular components, including dynamic chunking modules and isotropic (non-hierarchical) components, providing flexibility in design.

Quick Start & Requirements

  • Install via pip: pip install -e . after cloning the repository.
  • PyTorch version >= 2.5.1 is required.
  • Building mamba_ssm from source is strongly recommended: clone state-spaces/mamba, cd mamba, and pip install ..
  • Pretrained models are available on Hugging Face.

Highlighted Details

  • Offers 1-stage and 2-stage hierarchical models (e.g., hnet_1stage_L, hnet_2stage_XL).
  • Models are trained on a 100B-token subset of FineWeb-Edu, with compute matching GPT-3 Large/XL.
  • Provides weights for Chinese and Code tasks, trained on 46B-token subsets.
  • Includes generate.py for text generation with pretrained checkpoints.

Maintenance & Community

The project is associated with goombalab and authors Sukjun Hwang, Brandon Wang, and Albert Gu. Further details and model specifics are available in the linked paper and configuration files.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The README does not specify any explicit limitations or known issues. Users should consult the associated paper for a comprehensive understanding of the model's capabilities and constraints.

Health Check
Last commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
5
Star History
264 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Shizhe Diao Shizhe Diao(Research Scientist at NVIDIA; Author of LMFlow), and
3 more.

LongLoRA by dvlab-research

0.1%
3k
LongLoRA: Efficient fine-tuning for long-context LLMs
created 1 year ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
11 more.

codellama by meta-llama

0.0%
16k
Inference code for CodeLlama models
created 2 years ago
updated 1 year ago
Feedback? Help us improve.