Java library for Llama 3 inference
Top 46.1% on sourcepulse
This project provides a practical implementation of Llama 3, 3.1, and 3.2 inference directly in Java. It's designed for developers and researchers interested in running LLMs on the JVM, particularly for exploring and optimizing compiler features like GraalVM's Vector API. The primary benefit is enabling efficient LLM inference within a pure Java environment, leveraging modern JVM capabilities.
How It Works
The implementation is contained within a single Java file, parsing GGUF model formats and utilizing a Llama 3 tokenizer based on minbpe. It supports Grouped-Query Attention and various quantization formats (F16, BF16, Q8_0, Q4_0), with optimizations for matrix-vector multiplication via Java's Vector API. The project also offers GraalVM Native Image support for ahead-of-time compilation and AOT model pre-loading for instant inference.
Quick Start & Requirements
jbang Llama3.java --help
or java --enable-preview --source 21 --add-modules jdk.incubator.vector Llama3.java -i --model <model_path>
.curl -L -O https://huggingface.co/mukel/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_0.gguf
).Highlighted Details
Maintenance & Community
The project is maintained by mukel. Further community engagement details are not explicitly provided in the README.
Licensing & Compatibility
Limitations & Caveats
Requires Java 21+ and experimental VM features, which might necessitate using specific GraalVM builds. Performance tuning may involve understanding JVM compiler optimizations and the Vector API.
6 months ago
1 day