LLM inference runtime for local or edge deployments
Top 28.6% on sourcepulse
LlamaEdge provides a fast, portable, and secure way to run customized and fine-tuned Large Language Models (LLMs) locally or on edge devices. It targets developers and researchers seeking an alternative to Python-based AI inference, offering an OpenAI-compatible API for various GenAI tasks including text generation, speech-to-text, text-to-speech, and text-to-image.
How It Works
LlamaEdge leverages the Rust+Wasm (WebAssembly) stack, specifically the WasmEdge runtime, to execute LLMs. This approach offers a lightweight (30MB runtime), high-performance (native speed on GPUs), and secure (sandboxed) execution environment. It utilizes the GGML format for models and integrates llama.cpp as its backend via the WASI-NN plugin, enabling broad compatibility with Llama2-based models and cross-platform deployment across various CPUs, GPUs, and operating systems.
Quick Start & Requirements
curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash
libopenblas-dev
may be required (sudo apt install -y libopenblas-dev
). CUDA drivers are auto-detected for GPU support.llama-chat.wasm
, then run: wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-chat.wasm -p llama-3-chat
Highlighted Details
Maintenance & Community
The project is actively maintained by the Second State team. Further community engagement and resources can be found via their GitHub repository.
Licensing & Compatibility
The project's source code is open source, allowing modification and free use. Specific licensing details for the core components and dependencies should be verified for commercial or closed-source integration.
Limitations & Caveats
The project notes a known "module name conflict" error that is reportedly a false positive and does not impact functionality. Users with lower RAM may need to adjust context and batch sizes to avoid WASI-NN backend errors. Ensure model files and applications are in the same directory for proper loading.
1 week ago
Inactive