ai-infra-hpc  by jinbooooom

AI Infrastructure and HPC essentials

Created 1 year ago
433 stars

Top 68.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive tutorial and knowledge base for AI infrastructure and High-Performance Computing (HPC), detailing low-level interconnects, parallel programming models, and large-scale model training techniques. It targets engineers and researchers needing a deep understanding of hardware-software co-design for demanding AI workloads, offering insights into optimizing performance from chip to cluster.

How It Works

The project systematically covers foundational HPC concepts including CUDA programming, SIMD, OpenMP, and critical interconnects like PCIe, NVLink, InfiniBand, and RDMA. It delves into collective communication libraries (MPI, NCCL) and advanced AI training paradigms such as data, model, and pipeline parallelism, alongside distributed training frameworks like DeepSpeed and DeepEP. The content is structured to build understanding from hardware primitives to complex distributed training strategies.

Quick Start & Requirements

This repository functions as an educational resource rather than a runnable project. It lacks explicit installation or execution commands, focusing instead on detailed explanations and code snippets for understanding core concepts. Setup involves acquiring relevant hardware (GPUs, InfiniBand) and software environments (CUDA Toolkit, OFED) as per individual learning goals.

Highlighted Details

  • In-depth CUDA programming guide covering execution models, memory hierarchy, streams, concurrency, and debugging tools (Nsight, CUDA-GDB).
  • Detailed exploration of GPU interconnects like NVLink/NVSwitch and low-level communication protocols (GPUDirect, RDMA, InfiniBand, RoCE).
  • Comprehensive analysis of NCCL algorithms, protocols, and source code for efficient multi-GPU communication.
  • Extensive coverage of distributed training strategies for large models, including DP, DDP, TP, PP, ZeRO, and DeepSpeed/DeepEP.

Maintenance & Community

No information on contributors, community channels (Discord/Slack), or roadmap is present in the provided text.

Licensing & Compatibility

No license information is provided.

Limitations & Caveats

This is a learning repository, not a production-ready library. It assumes significant prior knowledge in systems programming and HPC. The content is a collection of notes and tutorials, requiring users to synthesize and apply the information to specific use cases.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
61 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 4 years ago
Updated 3 years ago
Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai) and Carol Willing Carol Willing(Core Contributor to CPython, Jupyter).

ai-performance-engineering by cfregly

1.2%
1k
AI Systems Performance Engineering for modern AI workloads
Created 1 year ago
Updated 4 weeks ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.5%
6k
Lecture series for GPU-accelerated computing
Created 2 years ago
Updated 5 days ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.3%
30k
LLM training in pure C/CUDA, no PyTorch needed
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.