Discover and explore top open-source AI tools and projects—updated daily.
FlashML-orgAccelerating classical machine learning with GPU operators
New!
Top 61.1% on SourcePulse
Fast and memory-efficient classical machine learning operators on GPUs are provided by FlashML-org/flashlib. It targets engineers and researchers seeking to enhance the performance of ML pipelines by replacing CPU-bound or less efficient GPU implementations with specialized, high-throughput kernels.
How It Works
The library is built upon Triton and CuteDSL, enabling the creation of custom GPU kernels for classical ML tasks. This low-level control allows for significant optimizations in computation and memory access, outperforming standard implementations. The design prioritizes performance through a diverse set of specialized primitives and efficient data flow on NVIDIA GPUs.
Quick Start & Requirements
pip install flashlib or from source (git clone https://github.com/FlashML-org/flashlib.git, then pip install -e .).Highlighted Details
flashlib.info submodule that estimates runtime, FLOPs, and HBM bytes in ~5 µs on CPU, aiding pipeline budgeting.flash_* functions and scikit-learn-style classes.Maintenance & Community
The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or a public roadmap.
Licensing & Compatibility
Licensed under the Apache License 2.0, which permits commercial use and integration into closed-source projects.
Limitations & Caveats
FlashLib is focused exclusively on classical machine learning operators and requires a CUDA-enabled GPU for its core functionalities. It does not cover deep learning models.
1 week ago
Inactive
microsoft
tunib-ai
baidu-research
szilard
gpu-mode
ggml-org
Dao-AILab