Jlama by tjake

LLM inference engine for Java applications

Created 2 years ago

1,235 stars

Top 31.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Georgi Gerganov

Author of llama.cpp, whisper.cpp

Project Summary

Jlama provides a modern LLM inference engine for Java developers, enabling direct integration of large language models into Java applications. It supports a wide range of popular LLM architectures and features like Paged Attention and Mixture of Experts, targeting developers who need to leverage LLMs within the Java ecosystem.

How It Works

Jlama leverages Java 21's Vector API for optimized inference performance. It supports various model formats, including Hugging Face's SafeTensors, and offers quantization (Q8, Q4) and precision options (F32, F16, BF16). The engine implements advanced techniques like Paged Attention and Mixture of Experts, aiming for efficient and scalable LLM execution within the JVM.

Quick Start & Requirements

CLI: Install via jbang app install --force jlama@tjake. Run models with jlama restapi <model_name>.
Java Project: Requires Java 21 or later. Enable preview features with export JDK_JAVA_OPTIONS="--add-modules jdk.incubator.vector --enable-preview". Add Maven dependencies: jlama-core and jlama-native.
Docs: https://www.jbang.dev/download/

Highlighted Details

Supports Gemma, Llama, Mistral, Mixtral, Qwen2, Granite, and GPT-2 models.
Implements Paged Attention, Mixture of Experts, and Tool Calling.
Offers OpenAI-compatible REST API and distributed inference capabilities.
Supports Hugging Face SafeTensors, various data types, and quantization (Q8, Q4).

Maintenance & Community

The project is maintained by T Jake Luciani. A roadmap includes support for more models, pure Java tokenizers, LoRA, GraalVM, and enhanced distributed inference.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Requires Java 21 with preview features enabled, which may not be suitable for all production environments. The roadmap indicates features like GraalVM support are still under development.

Health Check

Last Commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days