Jlama  by tjake

LLM inference engine for Java applications

Created 2 years ago
1,166 stars

Top 33.2% on SourcePulse

GitHubView on GitHub
Project Summary

Jlama provides a modern LLM inference engine for Java developers, enabling direct integration of large language models into Java applications. It supports a wide range of popular LLM architectures and features like Paged Attention and Mixture of Experts, targeting developers who need to leverage LLMs within the Java ecosystem.

How It Works

Jlama leverages Java 21's Vector API for optimized inference performance. It supports various model formats, including Hugging Face's SafeTensors, and offers quantization (Q8, Q4) and precision options (F32, F16, BF16). The engine implements advanced techniques like Paged Attention and Mixture of Experts, aiming for efficient and scalable LLM execution within the JVM.

Quick Start & Requirements

  • CLI: Install via jbang app install --force jlama@tjake. Run models with jlama restapi <model_name>.
  • Java Project: Requires Java 21 or later. Enable preview features with export JDK_JAVA_OPTIONS="--add-modules jdk.incubator.vector --enable-preview". Add Maven dependencies: jlama-core and jlama-native.
  • Docs: https://www.jbang.dev/download/

Highlighted Details

  • Supports Gemma, Llama, Mistral, Mixtral, Qwen2, Granite, and GPT-2 models.
  • Implements Paged Attention, Mixture of Experts, and Tool Calling.
  • Offers OpenAI-compatible REST API and distributed inference capabilities.
  • Supports Hugging Face SafeTensors, various data types, and quantization (Q8, Q4).

Maintenance & Community

The project is maintained by T Jake Luciani. A roadmap includes support for more models, pure Java tokenizers, LoRA, GraalVM, and enhanced distributed inference.

Licensing & Compatibility

Licensed under the Apache License 2.0, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Requires Java 21 with preview features enabled, which may not be suitable for all production environments. The roadmap indicates features like GraalVM support are still under development.

Health Check
Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
6
Issues (30d)
2
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Ross Wightman Ross Wightman(Author of timm; CV at Hugging Face), Awni Hannun Awni Hannun(Author of MLX; Research Scientist at Apple), and
1 more.

mlx-llm by riccardomusmeci

0%
454
LLM tools/apps for Apple Silicon using MLX
Created 1 year ago
Updated 7 months ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
11 more.

mistral.rs by EricLBuehler

0.3%
6k
LLM inference engine for blazing fast performance
Created 1 year ago
Updated 1 day ago
Starred by Lianmin Zheng Lianmin Zheng(Coauthor of SGLang, vLLM), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

MiniCPM by OpenBMB

0.4%
8k
Ultra-efficient LLMs for end devices, achieving 5x+ speedup
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.