java-llama.cpp  by kherud

Java SDK for local LLaMA model inference

created 1 year ago
374 stars

Top 76.9% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides Java bindings for llama.cpp, enabling efficient inference of large language models like LLaMA and Gemma directly from Java applications. It targets Java developers seeking to integrate LLM capabilities without relying on external services or complex Python environments.

How It Works

The library leverages JNI (Java Native Interface) to bridge Java code with the C/C++ core of llama.cpp. This allows for direct execution of model inference on the CPU or GPU (via CUDA or Metal, depending on llama.cpp build flags) within the JVM. The architecture supports streaming output, context management, and configuration of inference parameters like temperature and grammar.

Quick Start & Requirements

  • Maven Dependency:
    <dependency>
        <groupId>de.kherud</groupId>
        <artifactId>llama</artifactId>
        <version>4.1.0</version>
    </dependency>
    
  • Prerequisites: For out-of-the-box CPU inference, supported platforms include Linux (x86-64, aarch64), macOS (x86-64, aarch64), and Windows (x86-64, x64). GPU acceleration requires compiling llama.cpp with appropriate flags (e.g., -DGGML_CUDA=ON).
  • Setup: No setup is required for supported CPU platforms. For custom builds or GPU acceleration, compilation of the native library is necessary via Maven and CMake.
  • Documentation: Examples

Highlighted Details

  • Supports Gemma 3 and other GGUF-compatible models.
  • Enables GPU acceleration by passing llama.cpp build arguments to CMake.
  • Allows model downloading via Java code using ModelParameters#setModelUrl().
  • Provides options for custom shared library locations and system library installation.
  • Implements AutoCloseable for proper native memory management.

Maintenance & Community

The project is maintained by kherud. Community channels are not explicitly mentioned in the README.

Licensing & Compatibility

The project appears to be distributed under the MIT License, based on the llama.cpp dependency. This license is permissive and generally compatible with commercial and closed-source applications.

Limitations & Caveats

Custom builds or GPU acceleration require manual compilation of the native llama.cpp library, which can be complex. The README notes that llama.cpp allocates memory not managed by the JVM, necessitating careful use of AutoCloseable to prevent leaks. Android integration requires specific Gradle configurations.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anil Dash Anil Dash(Former CEO of Glitch), and
15 more.

llamafile by Mozilla-Ocho

0.2%
23k
Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc
created 1 year ago
updated 1 month ago
Feedback? Help us improve.