LlamaEdge by LlamaEdge

LLM inference runtime for local or edge deployments

Created 2 years ago

1,576 stars

Top 26.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Lysandre Debut

Chief Open-Source Officer at Hugging Face

Project Summary

LlamaEdge provides a fast, portable, and secure way to run customized and fine-tuned Large Language Models (LLMs) locally or on edge devices. It targets developers and researchers seeking an alternative to Python-based AI inference, offering an OpenAI-compatible API for various GenAI tasks including text generation, speech-to-text, text-to-speech, and text-to-image.

How It Works

LlamaEdge leverages the Rust+Wasm (WebAssembly) stack, specifically the WasmEdge runtime, to execute LLMs. This approach offers a lightweight (30MB runtime), high-performance (native speed on GPUs), and secure (sandboxed) execution environment. It utilizes the GGML format for models and integrates llama.cpp as its backend via the WASI-NN plugin, enabling broad compatibility with Llama2-based models and cross-platform deployment across various CPUs, GPUs, and operating systems.

Quick Start & Requirements

Install WasmEdge: curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash
Prerequisites: WasmEdge runtime. For CPU-only, libopenblas-dev may be required (sudo apt install -y libopenblas-dev). CUDA drivers are auto-detected for GPU support.
Example: Download a model (e.g., Llama 3.2 1B GGUF) and llama-chat.wasm, then run: wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-chat.wasm -p llama-3-chat
Docs: https://github.com/LlamaEdge/LlamaEdge

Highlighted Details

Supports LLM inference, Speech to Text, Text to Speech, Text to Image, and Multimodal applications.
Offers an OpenAI-compatible API server for seamless integration.
Cross-platform compatibility across macOS, Linux, Windows, x86, ARM, RISC-V, NVIDIA GPUs, and Apple GPUs.
Rust source code is available for modification and custom use.

Maintenance & Community

The project is actively maintained by the Second State team. Further community engagement and resources can be found via their GitHub repository.

Licensing & Compatibility

The project's source code is open source, allowing modification and free use. Specific licensing details for the core components and dependencies should be verified for commercial or closed-source integration.

Limitations & Caveats

The project notes a known "module name conflict" error that is reportedly a false positive and does not impact functionality. Users with lower RAM may need to adjust context and batch sizes to avoid WASI-NN backend errors. Ensure model files and applications are in the same directory for proper loading.

LlamaEdge by LlamaEdge

Explore Similar Projects

SiLLM by armbues

Kolosal by KolosalAI

OpenAOE by InternLM

aikit by kaito-project

ServerlessLLM by ServerlessLLM

examples by CerebriumAI

xFasterTransformer by intel

lemonade by lemonade-sdk

torchchat by pytorch

WasmEdge by WasmEdge

llamafile by mozilla-ai

gpt4all by nomic-ai