mHC-manifold-constrained-hyper-connections  by tokenbender

Research implementation of manifold-constrained hyper-connections for deep learning models

Created 3 weeks ago

New!

269 stars

Top 95.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a research implementation of Manifold-Constrained Hyper-Connections (mHC), a novel variant of Hyper-Connections designed for transformer architectures. It offers a clear and correct PyTorch implementation for researchers and engineers to experiment with mHC's unique layer update mechanism, which imposes specific manifold constraints on connection matrices, potentially leading to improved model performance or efficiency.

How It Works

The core of mHC is its layer update rule: x_{l+1} = H_l^{res} x_l + H_l^{post,T} F(H_l^{pre} x_l, W_l). Key constraints are enforced: H_res is a doubly stochastic matrix (from the Birkhoff polytope) computed via the Sinkhorn-Knopp algorithm, while H_pre and H_post are non-negative mixing maps. The implementation uses static per-layer matrices, learning H_res_logits and projecting it, and mapping H_pre_logits/H_post_logits to non-negative weights using mechanisms like softmax. This approach prioritizes research clarity over system-level optimizations.

Quick Start & Requirements

Training can be initiated from the examples/nanogpt/ directory using provided configuration files. Example commands include: python train.py config/train_fineweb10B.py python train.py config/train_fineweb10B_mhc.py Multi-GPU training is demonstrated using torchrun. The primary dataset is FineWeb10B.

Highlighted Details

  • This is a research prototype focused on correctness and clarity, not system optimizations.
  • Supports experimentation with both 6-layer and 48-layer model configurations.
  • Offers an optional orthostochastic H_res projection method using Newton-Schulz.
  • Leverages code snippets from nanoGPT and lucidrains/hyper-connections.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap were found in the provided README.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source linking.

Limitations & Caveats

The implementation is explicitly a research prototype, prioritizing correctness over system performance optimizations. Several planned next steps, such as alternative orthogonalization operations or U-net variants, are not yet implemented. The orthostochastic option requires careful configuration of specific parameters.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
270 stars in the last 26 days

Explore Similar Projects

Starred by Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI) and Ross Taylor Ross Taylor(Cofounder of General Reasoning; Cocreator of Papers with Code).

torchshard by kaiyuyue

0%
300
PyTorch engine for tensor slicing into parallel shards
Created 4 years ago
Updated 7 months ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm_training_handbook by huggingface

0.4%
550
Handbook for large language model training methodologies
Created 2 years ago
Updated 1 year ago
Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
12 more.

Paddle by PaddlePaddle

0.1%
24k
Deep learning framework for industrial practice
Created 9 years ago
Updated 1 day ago
Feedback? Help us improve.