Jlama  by tjake

LLM inference engine for Java applications

created 2 years ago
1,130 stars

Top 34.6% on sourcepulse

GitHubView on GitHub
Project Summary

Jlama provides a modern LLM inference engine for Java developers, enabling direct integration of large language models into Java applications. It supports a wide range of popular LLM architectures and features like Paged Attention and Mixture of Experts, targeting developers who need to leverage LLMs within the Java ecosystem.

How It Works

Jlama leverages Java 21's Vector API for optimized inference performance. It supports various model formats, including Hugging Face's SafeTensors, and offers quantization (Q8, Q4) and precision options (F32, F16, BF16). The engine implements advanced techniques like Paged Attention and Mixture of Experts, aiming for efficient and scalable LLM execution within the JVM.

Quick Start & Requirements

  • CLI: Install via jbang app install --force jlama@tjake. Run models with jlama restapi <model_name>.
  • Java Project: Requires Java 21 or later. Enable preview features with export JDK_JAVA_OPTIONS="--add-modules jdk.incubator.vector --enable-preview". Add Maven dependencies: jlama-core and jlama-native.
  • Docs: https://www.jbang.dev/download/

Highlighted Details

  • Supports Gemma, Llama, Mistral, Mixtral, Qwen2, Granite, and GPT-2 models.
  • Implements Paged Attention, Mixture of Experts, and Tool Calling.
  • Offers OpenAI-compatible REST API and distributed inference capabilities.
  • Supports Hugging Face SafeTensors, various data types, and quantization (Q8, Q4).

Maintenance & Community

The project is maintained by T Jake Luciani. A roadmap includes support for more models, pure Java tokenizers, LoRA, GraalVM, and enhanced distributed inference.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Requires Java 21 with preview features enabled, which may not be suitable for all production environments. The roadmap indicates features like GraalVM support are still under development.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
2
Star History
88 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.