ratchet by huggingface

Browser ML framework for cross-platform GPU inference

Created 2 years ago

739 stars

Top 47.0% on SourcePulse

4 Experts Love This Project

transitive-bullshit

Founder of Agentic

thomwolf

Cofounder of Hugging Face

julien-c

Julien Chaumond

Cofounder of Hugging Face

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

Ratchet is a cross-platform machine learning framework designed for web-first deployment, enabling GPU-accelerated inference in browsers and native applications. It targets developers seeking to integrate performant AI into existing production environments, offering a toolkit focused on inference, WebGPU/CPU execution, quantization, lazy computation, and in-place operations.

How It Works

Ratchet leverages WebGPU for hardware-accelerated computation, providing a unified API for both browser and native environments. Its design prioritizes efficient inference through first-class quantization support and lazy computation, minimizing overhead and maximizing performance on diverse hardware.

Quick Start & Requirements

Install/Run: Experience via Hugging Face Spaces (Whisper, Phi).
Prerequisites: Web browser with WebGPU support. JavaScript API demonstrated. Rust crate and CLI are forthcoming.
Resources: Demo sites are available for immediate testing.

Highlighted Details

Supports Whisper, Phi 2 & 3, and Moondream models, with Gemini 2 2B upcoming.
Features asynchronous loading and caching via IndexedDB for web applications.
Emphasizes quantization (e.g., Q8) for performance optimization.

Maintenance & Community

Currently in active development, seeking community contributions.
Community channels include Discord. Roadmap is available.

Licensing & Compatibility

License is not explicitly stated in the README.

Limitations & Caveats

The project is in active development, with ongoing work on the engine, model support, and compatibility. A Rust crate and CLI are not yet released.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

varuna by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI) and

Zhuohan Li

Zhuohan Li(Coauthor of vLLM).

calm by zeux

Single-GPU inference engine for rapid LLM prototyping

Created 2 years ago

Updated 7 months ago

Starred by

Meng Zhang

Meng Zhang(Cofounder of TabbyML).

crabml by crabml

Llama.cpp compatible inference engine in Rust

Created 2 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

GPTQModel by ModelCloud

LLM compression toolkit for accelerated CPU/GPU inference

Created 1 year ago

Updated 1 day ago

bolt by huawei-noah

Deep learning library for high-performance, heterogeneous deployment

Created 6 years ago

Updated 9 months ago

awesome-emdl by csarron

EMDL resources for efficient on-device deep learning research

Created 8 years ago

Updated 2 years ago

ik_llama.cpp by ikawrakow

`llama.cpp` fork for improved CPU/GPU performance

Created 1 year ago

Updated 1 day ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

pyllama by henrywoo

Hacked LLaMA version for single consumer-grade GPU inference

Created 2 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Ying Sheng

Ying Sheng(Coauthor of SGLang).

fastllm by ztxz16

High-performance C++ LLM inference library

Created 2 years ago

Updated 1 month ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

4 more.

gemma_pytorch by google

PyTorch implementation for Google's Gemma models

Created 1 year ago

Updated 7 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and

4 more.

ktransformers by kvcache-ai

Framework for LLM inference optimization experimentation

Created 1 year ago

Updated 1 day ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento).

ncnn by Tencent

Mobile-first inference framework for neural networks

Created 8 years ago

Updated 2 days ago

Feedback? Help us improve.