sass-king  by florianmattana

Reverse engineering NVIDIA SASS for performance analysis

Created 1 month ago
264 stars

Top 96.5% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SASS King addresses the critical gap in understanding modern NVIDIA GPU SASS (native instruction set) following significant architectural changes. It provides kernel engineers and researchers with a structured knowledge base and reverse-engineering methodology to analyze SASS dumps, identify compiler patterns, and link binary structures to source-level optimizations, enhancing transparency for performance analysis on newer NVIDIA architectures.

How It Works

The project employs a systematic methodology combining controlled micro-kernels with production-kernel audits. By isolating compiler decisions through single-parameter variations, it generates detailed SASS evidence. This data populates a knowledge base and a formal pattern library of reusable audit signatures. The approach uses strict claim tagging ([OBS], [INF], [HYP], [RES], [GAP]) for evidence traceability and prioritizes pattern-based audits over purely bit-level disassembly.

Quick Start & Requirements

  • Primary Interaction: A research repository; direct installation commands (pip, Docker) are not specified. Users engage with the corpus, knowledge base, and pattern library.
  • Prerequisites: Requires NVIDIA CUDA toolkit (nvcc, cuobjdump), Nsight Compute, and external tools like gpuasm.com and redplait/denvdis. Initial focus is SM120 / SM120a (consumer Blackwell).
  • Documentation: Key entry points include docs/START_HERE.md and patterns/README.md.

Highlighted Details

  • Phase 3 Pattern Library: Features 29 formalized, reusable SASS signatures with explanations, variants, anti-patterns, and confidence levels for audit building blocks.
  • Tensor-Core Focus: Comprehensive studies cover SM120 tensor-core operations (HMMA, QMMA, OMMA) and matrix memory pipelines (LDSM, STSM), including epilogues and sparse metadata.
  • Cross-Architecture Expansion: Plans to expand methodology and patterns across multiple NVIDIA architectures (SM80, SM86, SM89, SM90a, SM100a).
  • Evidence-Driven: Utilizes strict claim tagging ([OBS], [INF], [HYP], [RES], [GAP]) to ensure technical claims are tied to observed or inferred evidence.

Maintenance & Community

Authored by Florian Mattana. Contributions are welcomed via CONTRIBUTING.md. No specific community channels or sponsorship details are mentioned.

Licensing & Compatibility

The license type is not explicitly stated in the provided README text. Compatibility is focused on NVIDIA GPU architectures.

Limitations & Caveats

SASS ISA coverage is not complete; runtime layout decoding, full control-code bit placement, and cross-architecture replay are identified as future work. The project is actively developing (Phase 4: Production Audits is the next major step), indicating it is not yet a fully mature, stable toolset for all use cases.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
6
Star History
75 stars in the last 30 days

Explore Similar Projects

Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.3%
6k
Lecture series for GPU-accelerated computing
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.