Discover and explore top open-source AI tools and projects—updated daily.
AndreSlavescuAccelerating deep learning with CUDA mHC kernels
Top 99.8% on SourcePulse
This repository provides an unofficial CUDA implementation of DeepSeek-AI's Manifold-Constrained Hyper-Connections (mHC) layer. It targets researchers and engineers seeking to accelerate deep learning model training and inference on NVIDIA GPUs by offering highly optimized kernels. The primary benefit is substantial performance gains over standard PyTorch implementations.
How It Works
The project implements mHC kernels in CUDA, enabling direct GPU acceleration. It supports two modes: the default "Dynamic H Path," where H values are computed per-batch via learned projections, and a "Static H Path" optimized for faster inference by sharing H across the batch. This native CUDA approach bypasses PyTorch's overhead for significant speedups.
Quick Start & Requirements
make install. For development, use make install-dev.make for all architectures, or make CUDA_ARCH=90 for specific NVIDIA architectures like H100.make test and Python tests with make test-python.make bench and Python benchmarks via make bench-python.runmodal.py script facilitates testing and benchmarking on cloud GPUs (e.g., modal run runmodal.py --gpu h100 --mode bench).Highlighted Details
Maintenance & Community
Contribution guidelines are detailed in CONTRIBUTING.md. The project is associated with DeepSeek-AI, the authors of the original mHC paper. No specific community channels (like Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The license type is not explicitly stated in the provided README snippet. This requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
The implementation is CUDA-specific, requiring NVIDIA hardware and a compatible CUDA toolkit. The project appears to be an unofficial implementation, focusing on performance optimization rather than a full-featured library. The license status is unknown, which could impact adoption.
1 month ago
Inactive
microsoft
baidu-research
ztxz16
NervanaSystems