Discover and explore top open-source AI tools and projects—updated daily.
beehive-labGPU-accelerated LLM inference in pure Java
Top 98.0% on SourcePulse
This project provides GPU-accelerated Large Language Model (LLM) inference directly within the Java ecosystem using TornadoVM. It targets Java developers seeking to integrate high-performance LLMs like Llama3, Mistral, and others into their applications without relying on Python, offering efficient execution on GPUs.
How It Works
GPULlama3.java leverages TornadoVM to automatically compile and accelerate Java code for GPU execution. It builds upon the Llama3.java library, enabling inference for various LLM architectures (Llama3, Mistral, Qwen, Phi-3, Granite) in GGUF format. The core advantage lies in bringing native GPU acceleration for LLMs to the Java Virtual Machine, facilitating seamless integration with Java frameworks.
Quick Start & Requirements
llama-tornado) or JBang for execution. Maven dependency is also available.Highlighted Details
Maintenance & Community
The project is built upon Llama3.java by Alfonso² Peterssen. Development is partially funded by several EU & UKRI grants, including Horizon Europe and UKRI AERO. A roadmap is available for future development.
Licensing & Compatibility
The project is released under the MIT license, which is permissive for commercial use and integration into closed-source applications.
Limitations & Caveats
This project is in the early stages of Java's AI integration. Support for Intel, Apple Silicon, and AMD GPUs is marked as Work In Progress (WIP). Users may encounter GPU Out-of-Memory errors, requiring adjustments to GPU memory allocation or model quantization. Performance is highly dependent on the specific hardware and model used.
1 day ago
Inactive
OpenBMB