java-llama.cpp  by kherud

Java SDK for local LLaMA model inference

Created 2 years ago
381 stars

Top 74.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides Java bindings for llama.cpp, enabling efficient inference of large language models like LLaMA and Gemma directly from Java applications. It targets Java developers seeking to integrate LLM capabilities without relying on external services or complex Python environments.

How It Works

The library leverages JNI (Java Native Interface) to bridge Java code with the C/C++ core of llama.cpp. This allows for direct execution of model inference on the CPU or GPU (via CUDA or Metal, depending on llama.cpp build flags) within the JVM. The architecture supports streaming output, context management, and configuration of inference parameters like temperature and grammar.

Quick Start & Requirements

  • Maven Dependency:
    <dependency>
        <groupId>de.kherud</groupId>
        <artifactId>llama</artifactId>
        <version>4.1.0</version>
    </dependency>
    
  • Prerequisites: For out-of-the-box CPU inference, supported platforms include Linux (x86-64, aarch64), macOS (x86-64, aarch64), and Windows (x86-64, x64). GPU acceleration requires compiling llama.cpp with appropriate flags (e.g., -DGGML_CUDA=ON).
  • Setup: No setup is required for supported CPU platforms. For custom builds or GPU acceleration, compilation of the native library is necessary via Maven and CMake.
  • Documentation: Examples

Highlighted Details

  • Supports Gemma 3 and other GGUF-compatible models.
  • Enables GPU acceleration by passing llama.cpp build arguments to CMake.
  • Allows model downloading via Java code using ModelParameters#setModelUrl().
  • Provides options for custom shared library locations and system library installation.
  • Implements AutoCloseable for proper native memory management.

Maintenance & Community

The project is maintained by kherud. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project appears to be distributed under the MIT License, based on the llama.cpp dependency. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

Custom builds or GPU acceleration require manual compilation of the native llama.cpp library, which can be complex. The README notes that llama.cpp allocates memory not managed by the JVM, necessitating careful use of AutoCloseable to prevent leaks. Android integration requires specific Gradle configurations.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
40 more.

llama by meta-llama

0.1%
59k
Inference code for Llama 2 models (deprecated)
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.