Java SDK for local LLaMA model inference
Top 76.9% on sourcepulse
This project provides Java bindings for llama.cpp
, enabling efficient inference of large language models like LLaMA and Gemma directly from Java applications. It targets Java developers seeking to integrate LLM capabilities without relying on external services or complex Python environments.
How It Works
The library leverages JNI (Java Native Interface) to bridge Java code with the C/C++ core of llama.cpp
. This allows for direct execution of model inference on the CPU or GPU (via CUDA or Metal, depending on llama.cpp
build flags) within the JVM. The architecture supports streaming output, context management, and configuration of inference parameters like temperature and grammar.
Quick Start & Requirements
<dependency>
<groupId>de.kherud</groupId>
<artifactId>llama</artifactId>
<version>4.1.0</version>
</dependency>
llama.cpp
with appropriate flags (e.g., -DGGML_CUDA=ON
).Highlighted Details
llama.cpp
build arguments to CMake.ModelParameters#setModelUrl()
.AutoCloseable
for proper native memory management.Maintenance & Community
The project is maintained by kherud. Community channels are not explicitly mentioned in the README.
Licensing & Compatibility
The project appears to be distributed under the MIT License, based on the llama.cpp
dependency. This license is permissive and generally compatible with commercial and closed-source applications.
Limitations & Caveats
Custom builds or GPU acceleration require manual compilation of the native llama.cpp
library, which can be complex. The README notes that llama.cpp
allocates memory not managed by the JVM, necessitating careful use of AutoCloseable
to prevent leaks. Android integration requires specific Gradle configurations.
1 month ago
1 day