llama3.java  by mukel

Java library for Llama 3 inference

created 1 year ago
773 stars

Top 46.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a practical implementation of Llama 3, 3.1, and 3.2 inference directly in Java. It's designed for developers and researchers interested in running LLMs on the JVM, particularly for exploring and optimizing compiler features like GraalVM's Vector API. The primary benefit is enabling efficient LLM inference within a pure Java environment, leveraging modern JVM capabilities.

How It Works

The implementation is contained within a single Java file, parsing GGUF model formats and utilizing a Llama 3 tokenizer based on minbpe. It supports Grouped-Query Attention and various quantization formats (F16, BF16, Q8_0, Q4_0), with optimizations for matrix-vector multiplication via Java's Vector API. The project also offers GraalVM Native Image support for ahead-of-time compilation and AOT model pre-loading for instant inference.

Quick Start & Requirements

  • Install/Run: Use jbang Llama3.java --help or java --enable-preview --source 21 --add-modules jdk.incubator.vector Llama3.java -i --model <model_path>.
  • Prerequisites: Java 21+ (specifically for MemorySegment mmap-ing), GraalVM (EA builds recommended for latest Vector API support).
  • Setup: Download GGUF model files from Hugging Face (e.g., curl -L -O https://huggingface.co/mukel/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_0.gguf).
  • Docs: https://github.com/mukel/llama3.java

Highlighted Details

  • Single file, no external dependencies.
  • Supports Llama 3, 3.1 (RoPE scaling), and 3.2 (tied embeddings).
  • Leverages Java Vector API for performance.
  • GraalVM Native Image support with AOT model pre-loading for zero-overhead inference.
  • CLI modes for chat and instruction following.

Maintenance & Community

The project is maintained by mukel. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires Java 21+ and experimental VM features, which might necessitate using specific GraalVM builds. Performance tuning may involve understanding JVM compiler optimizations and the Vector API.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
67 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.