dl_note by harleyszhang

Deep learning notes covering fundamentals, optimization, and deployment

Created 3 years ago

503 stars

Top 61.9% on SourcePulse

Project Summary

This repository provides comprehensive personal notes and practical guides for deep learning, focusing on computer vision and large language models. It targets engineers and researchers seeking to understand foundational concepts, advanced techniques like model compression and inference optimization, and practical implementation details. The project aims to demystify complex deep learning topics through clear explanations and code examples.

How It Works

The project is structured into distinct sections covering mathematical foundations, core neural network components, classic CNN architectures, hyperparameter tuning ("alchemy"), model compression algorithms, and inference deployment strategies. A key highlight is a custom inference framework built with Triton and PyTorch, designed for ease of use and performance, claiming speeds comparable to cuBLAS for matrix multiplication and significant acceleration over standard libraries for specific models.

Quick Start & Requirements

Installation: Primarily through cloning the repository and following individual section instructions.
Prerequisites: Python, PyTorch, and potentially CUDA for GPU acceleration. Specific sections may require additional libraries detailed within their respective directories.
Resources: Setup time varies by section; inference framework examples may require significant GPU memory and compute for testing.
Links:
- Custom Inference Framework Course: [Link provided in README, requires scanning QR code]
- LLM Notes: [Link to llm_note repository, not provided]
- AI-System: [Link to AI-System repository, not provided]
- PyTorch Deep Learning: [Link to pytorch-deep-learning repository, not provided]

Highlighted Details

Custom inference framework uses Triton for GPU kernels, offering PyTorch-like syntax.
Claims up to 4x speedup on Llama3 1B/3B models compared to transformers.
Supports FlashAttention (V1-V3), GQA, and PageAttention.
Includes detailed explanations of model compression techniques (pruning, distillation, quantization).
Covers heterogeneous computing with NEON and CUDA programming.

Maintenance & Community

The project appears to be a personal endeavor with ongoing updates mentioned for the paid course. Links to a WeChat public account ("嵌入式视觉") are provided for community engagement and content updates.

Licensing & Compatibility

The repository's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The core inference framework is presented as a paid course, with details and access contingent on purchase. Some linked external resources may not be directly hosted or maintained within this repository. The project's scope is broad, and depth may vary across sections.

dl_note by harleyszhang

Explore Similar Projects

all-of-it by Infatoshi

History-of-Deep-Learning by saurabhaloneai

Great-Deep-Learning-Tutorials by ahkarami

DL by Dyakonov

dl-course by catalyst-team

scipy2023-deeplearning by rasbt

info8010-deep-learning by glouppe

DLTFpT by jonkrohn

DeepLearning_LHY21_Notes by unclestrong

DeepLearning by wangshusen

DeepSpeed by deepspeedai

pytorch-tutorial by yunjey