Discover and explore top open-source AI tools and projects—updated daily.
NVIDIACUDA Tile kernel library for efficient GPU programming
Top 57.7% on SourcePulse
Summary
TileGym is a CUDA Tile kernel library designed to simplify and accelerate tile-based GPU programming. It offers a comprehensive collection of tutorials and practical examples, targeting developers learning GPU kernel optimization or seeking to enhance large language model (LLM) performance. By providing efficient kernel implementations and end-to-end integration examples with models like Llama 3.1 and DeepSeek V2, TileGym enables users to build and benchmark high-performance GPU kernels.
How It Works
The project leverages CUDA Tile to provide optimized kernel implementations for common deep learning operations. Its core approach focuses on practical, tile-based programming patterns, demonstrating how to achieve efficiency through careful memory access and computation tiling. This is exemplified by its integration examples, showcasing how these optimized kernels can directly accelerate inference for popular LLMs, offering a tangible benefit for performance-critical applications.
Quick Start & Requirements
cd into the directory, and run pip install .. An editable install is pip install -e .. A Dockerfile is also provided.cutile-python: https://github.com/nvidia/cutile-python.Highlighted Details
Maintenance & Community
The project welcomes contributions and outlines guidelines in CONTRIBUTING.md, including a Contributor License Agreement (CLA) process. Specific community channels or roadmap details are not detailed in the provided README.
Licensing & Compatibility
Limitations & Caveats
Currently, TileGym is built and tested exclusively on CUDA 13.1 and requires NVIDIA Blackwell architecture GPUs. Support for other GPU architectures is planned for future releases.
3 days ago
Inactive
mirage-project
HazyResearch
NVIDIA
NVIDIA