llama3.java  by mukel

Java library for Llama 3 inference

Created 1 year ago
782 stars

Top 44.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a practical implementation of Llama 3, 3.1, and 3.2 inference directly in Java. It's designed for developers and researchers interested in running LLMs on the JVM, particularly for exploring and optimizing compiler features like GraalVM's Vector API. The primary benefit is enabling efficient LLM inference within a pure Java environment, leveraging modern JVM capabilities.

How It Works

The implementation is contained within a single Java file, parsing GGUF model formats and utilizing a Llama 3 tokenizer based on minbpe. It supports Grouped-Query Attention and various quantization formats (F16, BF16, Q8_0, Q4_0), with optimizations for matrix-vector multiplication via Java's Vector API. The project also offers GraalVM Native Image support for ahead-of-time compilation and AOT model pre-loading for instant inference.

Quick Start & Requirements

  • Install/Run: Use jbang Llama3.java --help or java --enable-preview --source 21 --add-modules jdk.incubator.vector Llama3.java -i --model <model_path>.
  • Prerequisites: Java 21+ (specifically for MemorySegment mmap-ing), GraalVM (EA builds recommended for latest Vector API support).
  • Setup: Download GGUF model files from Hugging Face (e.g., curl -L -O https://huggingface.co/mukel/Llama-3.2-3B-Instruct-GGUF/resolve/main/Llama-3.2-3B-Instruct-Q4_0.gguf).
  • Docs: https://github.com/mukel/llama3.java

Highlighted Details

  • Single file, no external dependencies.
  • Supports Llama 3, 3.1 (RoPE scaling), and 3.2 (tied embeddings).
  • Leverages Java Vector API for performance.
  • GraalVM Native Image support with AOT model pre-loading for zero-overhead inference.
  • CLI modes for chat and instruction following.

Maintenance & Community

The project is maintained by mukel. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Requires Java 21+ and experimental VM features, which might necessitate using specific GraalVM builds. Performance tuning may involve understanding JVM compiler optimizations and the Vector API.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Junyang Lin Junyang Lin(Core Maintainer at Alibaba Qwen), Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), and
1 more.

LLMFarm by guinmoon

0.4%
2k
iOS/MacOS app for local LLM inference
Created 2 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Gabriel Almeida Gabriel Almeida(Cofounder of Langflow), and
2 more.

torchchat by pytorch

0.1%
4k
PyTorch-native SDK for local LLM inference across diverse platforms
Created 1 year ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
8 more.

lit-llama by Lightning-AI

0.1%
6k
LLaMA implementation for pretraining, finetuning, and inference
Created 2 years ago
Updated 2 months ago
Starred by Roy Frostig Roy Frostig(Coauthor of JAX; Research Scientist at Google DeepMind), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
40 more.

llama by meta-llama

0.1%
59k
Inference code for Llama 2 models (deprecated)
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.