mHC-manifold-constrained-hyper-connections by tokenbender

Research implementation of manifold-constrained hyper-connections for deep learning models

Created 2 months ago

325 stars

Top 84.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Wing Lian

Founder of Axolotl AI

Project Summary

This repository provides a research implementation of Manifold-Constrained Hyper-Connections (mHC), a novel variant of Hyper-Connections designed for transformer architectures. It offers a clear and correct PyTorch implementation for researchers and engineers to experiment with mHC's unique layer update mechanism, which imposes specific manifold constraints on connection matrices, potentially leading to improved model performance or efficiency.

How It Works

The core of mHC is its layer update rule: x_{l+1} = H_l^{res} x_l + H_l^{post,T} F(H_l^{pre} x_l, W_l). Key constraints are enforced: H_res is a doubly stochastic matrix (from the Birkhoff polytope) computed via the Sinkhorn-Knopp algorithm, while H_pre and H_post are non-negative mixing maps. The implementation uses static per-layer matrices, learning H_res_logits and projecting it, and mapping H_pre_logits/H_post_logits to non-negative weights using mechanisms like softmax. This approach prioritizes research clarity over system-level optimizations.

Quick Start & Requirements

Training can be initiated from the examples/nanogpt/ directory using provided configuration files. Example commands include: python train.py config/train_fineweb10B.py python train.py config/train_fineweb10B_mhc.py Multi-GPU training is demonstrated using torchrun. The primary dataset is FineWeb10B.

mHC Paper: https://arxiv.org/abs/2512.24880
Hyper-Connections Paper: https://arxiv.org/abs/2409.19606

Highlighted Details

This is a research prototype focused on correctness and clarity, not system optimizations.
Supports experimentation with both 6-layer and 48-layer model configurations.
Offers an optional orthostochastic H_res projection method using Newton-Schulz.
Leverages code snippets from nanoGPT and lucidrains/hyper-connections.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap were found in the provided README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source linking.

Limitations & Caveats

The implementation is explicitly a research prototype, prioritizing correctness over system performance optimizations. Several planned next steps, such as alternative orthogonalization operations or U-net variants, are not yet implemented. The orthostochastic option requires careful configuration of specific parameters.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days