Discover and explore top open-source AI tools and projects—updated daily.
tokenbenderResearch implementation of manifold-constrained hyper-connections for deep learning models
New!
Top 95.8% on SourcePulse
This repository provides a research implementation of Manifold-Constrained Hyper-Connections (mHC), a novel variant of Hyper-Connections designed for transformer architectures. It offers a clear and correct PyTorch implementation for researchers and engineers to experiment with mHC's unique layer update mechanism, which imposes specific manifold constraints on connection matrices, potentially leading to improved model performance or efficiency.
How It Works
The core of mHC is its layer update rule: x_{l+1} = H_l^{res} x_l + H_l^{post,T} F(H_l^{pre} x_l, W_l). Key constraints are enforced: H_res is a doubly stochastic matrix (from the Birkhoff polytope) computed via the Sinkhorn-Knopp algorithm, while H_pre and H_post are non-negative mixing maps. The implementation uses static per-layer matrices, learning H_res_logits and projecting it, and mapping H_pre_logits/H_post_logits to non-negative weights using mechanisms like softmax. This approach prioritizes research clarity over system-level optimizations.
Quick Start & Requirements
Training can be initiated from the examples/nanogpt/ directory using provided configuration files. Example commands include:
python train.py config/train_fineweb10B.py
python train.py config/train_fineweb10B_mhc.py
Multi-GPU training is demonstrated using torchrun. The primary dataset is FineWeb10B.
Highlighted Details
H_res projection method using Newton-Schulz.Maintenance & Community
No specific details regarding maintainers, community channels (like Discord/Slack), or a public roadmap were found in the provided README.
Licensing & Compatibility
The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source linking.
Limitations & Caveats
The implementation is explicitly a research prototype, prioritizing correctness over system performance optimizations. Several planned next steps, such as alternative orthogonalization operations or U-net variants, are not yet implemented. The orthostochastic option requires careful configuration of specific parameters.
3 weeks ago
Inactive
kaiyuyue
huggingface
facebookresearch
jiaweizzhao
catalyst-team
PaddlePaddle