java-llama.cpp by kherud

Java SDK for local LLaMA model inference

Created 2 years ago

400 stars

Top 72.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

This project provides Java bindings for llama.cpp, enabling efficient inference of large language models like LLaMA and Gemma directly from Java applications. It targets Java developers seeking to integrate LLM capabilities without relying on external services or complex Python environments.

How It Works

The library leverages JNI (Java Native Interface) to bridge Java code with the C/C++ core of llama.cpp. This allows for direct execution of model inference on the CPU or GPU (via CUDA or Metal, depending on llama.cpp build flags) within the JVM. The architecture supports streaming output, context management, and configuration of inference parameters like temperature and grammar.

Quick Start & Requirements

Maven Dependency:

<dependency>
    <groupId>de.kherud</groupId>
    <artifactId>llama</artifactId>
    <version>4.1.0</version>
</dependency>

Prerequisites: For out-of-the-box CPU inference, supported platforms include Linux (x86-64, aarch64), macOS (x86-64, aarch64), and Windows (x86-64, x64). GPU acceleration requires compiling llama.cpp with appropriate flags (e.g., -DGGML_CUDA=ON).
Setup: No setup is required for supported CPU platforms. For custom builds or GPU acceleration, compilation of the native library is necessary via Maven and CMake.
Documentation: Examples

Highlighted Details

Supports Gemma 3 and other GGUF-compatible models.
Enables GPU acceleration by passing llama.cpp build arguments to CMake.
Allows model downloading via Java code using ModelParameters#setModelUrl().
Provides options for custom shared library locations and system library installation.
Implements AutoCloseable for proper native memory management.

Maintenance & Community

The project is maintained by kherud. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project appears to be distributed under the MIT License, based on the llama.cpp dependency. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

Custom builds or GPU acceleration require manual compilation of the native llama.cpp library, which can be complex. The README notes that llama.cpp allocates memory not managed by the JVM, necessitating careful use of AutoCloseable to prevent leaks. Android integration requires specific Gradle configurations.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days