gemma.cpp  by google

C++ inference engine for Google's Gemma models

created 1 year ago
6,516 stars

Top 8.0% on sourcepulse

GitHubView on GitHub
Project Summary

This C++ inference engine provides a lightweight, standalone solution for running Google's Gemma, RecurrentGemma, and PaliGemma models. It targets researchers and developers needing direct control over LLM computation, offering a minimalist implementation (~2K LoC core) inspired by projects like ggml and llama.c. The engine leverages portable SIMD via the Google Highway Library for efficient CPU inference.

How It Works

The engine implements Gemma models using a direct C++ approach, avoiding the abstraction layers common in Python frameworks. It utilizes the Google Highway Library for SIMD acceleration, enabling efficient CPU-bound inference. Model weights are loaded in various formats, including 8-bit switched floating point (-sfp) for reduced memory and faster inference, or bfloat16 for higher fidelity.

Quick Start & Requirements

  • Install: Build using CMake (cmake -B build && cmake --build --preset make).
  • Prerequisites: CMake, Clang C++17 compiler, tar. Windows requires Visual Studio 2022 Build Tools with LLVM/Clang.
  • Model Weights: Download from Kaggle or Hugging Face Hub (e.g., gemma-2b-it-sfp).
  • Run: ./gemma --tokenizer <tokenizer_file> --weights <weights_file> --model <model_name>
  • Docs: ai.google.dev/gemma

Highlighted Details

  • Supports Gemma (1, 2, 3), RecurrentGemma, and PaliGemma (VLM) models.
  • Offers -sfp (8-bit switched floating point) weights for performance.
  • Can be incorporated as a library in CMake projects using FetchContent.
  • Includes a tool for migrating weights to a single-file format.

Maintenance & Community

  • Active development on the dev branch.
  • Community contributions welcome; Discord server available.
  • Key contributors include Austin Huang, Jan Wassenberg, Phil Culliton, Paul Chang, and Dan Zheng.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Windows builds are recommended via WSL; native Windows support is being explored.
  • CLI usage is experimental and may have context length limitations.
  • Image reading for PaliGemma is basic, currently supporting only binary PPM (P6) format.
Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
1
Star History
167 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.