llama3.java by mukel

Java library for Llama 3 inference

Created 1 year ago

789 stars

Top 44.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

David Phillips

Author of Trino, Presto

Project Summary

This project provides a practical implementation of Llama 3, 3.1, and 3.2 inference directly in Java. It's designed for developers and researchers interested in running LLMs on the JVM, particularly for exploring and optimizing compiler features like GraalVM's Vector API. The primary benefit is enabling efficient LLM inference within a pure Java environment, leveraging modern JVM capabilities.

How It Works

The implementation is contained within a single Java file, parsing GGUF model formats and utilizing a Llama 3 tokenizer based on minbpe. It supports Grouped-Query Attention and various quantization formats (F16, BF16, Q8_0, Q4_0), with optimizations for matrix-vector multiplication via Java's Vector API. The project also offers GraalVM Native Image support for ahead-of-time compilation and AOT model pre-loading for instant inference.

Quick Start & Requirements

Install/Run: Use jbang Llama3.java --help or java --enable-preview --source 21 --add-modules jdk.incubator.vector Llama3.java -i --model <model_path>.
Prerequisites: Java 21+ (specifically for MemorySegment mmap-ing), GraalVM (EA builds recommended for latest Vector API support).
Setup: Download GGUF model files from Hugging Face (e.g., curl -L -O https://huggingface.co/mukel/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_0.gguf).
Docs: https://github.com/mukel/llama3.java

Highlighted Details

Single file, no external dependencies.
Supports Llama 3, 3.1 (RoPE scaling), and 3.2 (tied embeddings).
Leverages Java Vector API for performance.
GraalVM Native Image support with AOT model pre-loading for zero-overhead inference.
CLI modes for chat and instruction following.

Maintenance & Community

The project is maintained by mukel. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

License: MIT.
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires Java 21+ and experimental VM features, which might necessitate using specific GraalVM builds. Performance tuning may involve understanding JVM compiler optimizations and the Vector API.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days