ggml  by ggml-org

Tensor library for machine learning

Created 3 years ago
13,176 stars

Top 3.8% on SourcePulse

GitHubView on GitHub
Project Summary

ggml is a C tensor library designed for machine learning, focusing on efficient execution across diverse hardware. It targets developers and researchers needing a low-level, dependency-free tensor computation engine for inference and training, particularly on resource-constrained devices. The library's key benefit is its portability and performance through features like integer quantization and broad hardware acceleration.

How It Works

ggml employs a C-based, low-level implementation for maximum portability and minimal overhead. It supports integer quantization to reduce model size and memory bandwidth, enabling faster inference on CPUs and GPUs. The library handles automatic differentiation and includes optimizers like ADAM and L-BFGS, facilitating both inference and training workflows. A notable design choice is its commitment to zero memory allocations during runtime, contributing to predictable performance.

Quick Start & Requirements

  • Install: Clone the repository, set up a Python virtual environment, and install dependencies with pip install -r requirements.txt.
  • Build: Use CMake: mkdir build && cd build && cmake .. && cmake --build . --config Release -j 8.
  • Prerequisites: C++ compiler, CMake, Python 3.10+. Optional: CUDA 12.1+ for GPU acceleration, hipBLAS for AMD GPUs, Intel oneAPI for SYCL. Android development requires the NDK.
  • Resources: Introduction to ggml, GGUF file format.

Highlighted Details

  • Broad hardware support including CPU, CUDA, hipBLAS, and SYCL.
  • Integer quantization support for efficient inference.
  • Automatic differentiation and optimizers (ADAM, L-BFGS).
  • Zero memory allocations during runtime.

Maintenance & Community

Development is active, with significant contributions seen in related projects like llama.cpp and whisper.cpp.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The project is under active development, implying potential for breaking changes. Specific hardware acceleration configurations (CUDA, hipBLAS, SYCL) require careful setup and may have version dependencies.

Health Check
Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
8
Issues (30d)
5
Star History
166 stars in the last 30 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
1 more.

GPTQ-triton by fpgaminer

0%
307
Triton kernel for GPTQ inference, improving context scaling
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.