AIInfra by Infrasys-AI

AI infrastructure course for large model systems design

Created 1 year ago

5,685 stars

Top 8.8% on SourcePulse

Project Summary

This repository provides a comprehensive, open-source curriculum on AI infrastructure, focusing on the hardware and software stack required for large language model (LLM) training and inference. It targets advanced undergraduates, graduate students, and AI system practitioners seeking to understand the end-to-end lifecycle of AI systems. The benefit is a structured learning path covering everything from AI chip architecture to distributed training and cutting-edge LLM algorithms.

How It Works

The curriculum is structured into modules covering AI chip principles, communication and storage, AI clusters, LLM training, LLM inference, LLM algorithms, and hot technical topics. It delves into hardware architectures (GPUs, TPUs, NPUs), distributed systems concepts (parallelism, communication libraries like NCCL), LLM-specific techniques (Transformer variants, quantization, efficient inference), and emerging trends like AI agents. The approach emphasizes a full-stack perspective, bridging hardware capabilities with software frameworks and algorithmic advancements.

Quick Start & Requirements

Content Access: The primary way to access content is via the Releases page due to the repository's large size (10GB+). Direct git clone is discouraged due to potential slowness.
Prerequisites: No specific software installation is required to view the curriculum content (slides, PPTs). However, understanding the topics requires foundational knowledge in computer architecture, operating systems, and deep learning.
Resources: Viewing the content requires disk space for the downloaded materials.

Highlighted Details

Comprehensive coverage of AI hardware, including NVIDIA GPUs, TPUs, NPUs, and domestic AI processors.
In-depth exploration of distributed training techniques (TP, PP, EP, SP, DP) and communication primitives.
Analysis of LLM inference optimization frameworks (vLLM, SGLang) and techniques like quantization and long-sequence handling.
Regular updates on the latest LLM algorithms and hot topics, including specific model deep dives (e.g., DeepSeek, Llama).

Maintenance & Community

The project is actively maintained by chenzomi12, with content being continuously updated. Contributions are welcomed via Pull Requests. Video content is hosted on Bilibili and YouTube (ZOMI酱, ZOMI6222).

Licensing & Compatibility

The repository content is open-source, with an expectation for proper attribution when using PPT materials. Specific licensing details for the content itself are not explicitly stated beyond a general encouragement for use and contribution.

Limitations & Caveats

Many sections are marked as "待更" (to be updated) or "更新中" (updating), indicating the curriculum is still under active development. The sheer size of the repository may pose challenges for users with limited bandwidth or storage.

AIInfra by Infrasys-AI

Explore Similar Projects

Openai-Claude-Deepseek-API-provider by TechnologyStar

Efficient_Foundation_Model_Survey by UbiquitousLearning

aisys-building-blocks by HazyResearch

awesome-distributed-ml by Shenggan

ml-systems-papers by byungsoo-oh

SysML-reading-list by mcanini

huskarl by danaugrs

AI-Sheet by Srilochan7

GenerativeAICourse by AbdullahAbuHassann

AI-fundermentals by ForceInjection

Neural-Networks-on-Silicon by fengbintu

courses by SkalskiP