LlamaEdge  by LlamaEdge

LLM inference runtime for local or edge deployments

created 1 year ago
1,464 stars

Top 28.6% on sourcepulse

GitHubView on GitHub
Project Summary

LlamaEdge provides a fast, portable, and secure way to run customized and fine-tuned Large Language Models (LLMs) locally or on edge devices. It targets developers and researchers seeking an alternative to Python-based AI inference, offering an OpenAI-compatible API for various GenAI tasks including text generation, speech-to-text, text-to-speech, and text-to-image.

How It Works

LlamaEdge leverages the Rust+Wasm (WebAssembly) stack, specifically the WasmEdge runtime, to execute LLMs. This approach offers a lightweight (30MB runtime), high-performance (native speed on GPUs), and secure (sandboxed) execution environment. It utilizes the GGML format for models and integrates llama.cpp as its backend via the WASI-NN plugin, enabling broad compatibility with Llama2-based models and cross-platform deployment across various CPUs, GPUs, and operating systems.

Quick Start & Requirements

  • Install WasmEdge: curl -sSf https://raw.githubusercontent.com/WasmEdge/WasmEdge/master/utils/install_v2.sh | bash
  • Prerequisites: WasmEdge runtime. For CPU-only, libopenblas-dev may be required (sudo apt install -y libopenblas-dev). CUDA drivers are auto-detected for GPU support.
  • Example: Download a model (e.g., Llama 3.2 1B GGUF) and llama-chat.wasm, then run: wasmedge --dir .:. --nn-preload default:GGML:AUTO:Llama-3.2-1B-Instruct-Q5_K_M.gguf llama-chat.wasm -p llama-3-chat
  • Docs: https://github.com/LlamaEdge/LlamaEdge

Highlighted Details

  • Supports LLM inference, Speech to Text, Text to Speech, Text to Image, and Multimodal applications.
  • Offers an OpenAI-compatible API server for seamless integration.
  • Cross-platform compatibility across macOS, Linux, Windows, x86, ARM, RISC-V, NVIDIA GPUs, and Apple GPUs.
  • Rust source code is available for modification and custom use.

Maintenance & Community

The project is actively maintained by the Second State team. Further community engagement and resources can be found via their GitHub repository.

Licensing & Compatibility

The project's source code is open source, allowing modification and free use. Specific licensing details for the core components and dependencies should be verified for commercial or closed-source integration.

Limitations & Caveats

The project notes a known "module name conflict" error that is reportedly a false positive and does not impact functionality. Users with lower RAM may need to adjust context and batch sizes to avoid WASI-NN backend errors. Ensure model files and applications are in the same directory for proper loading.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
6
Issues (30d)
0
Star History
106 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anil Dash Anil Dash(Former CEO of Glitch), and
15 more.

llamafile by Mozilla-Ocho

0.2%
23k
Single-file LLM distribution and runtime via `llama.cpp` and Cosmopolitan Libc
created 1 year ago
updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 14 hours ago
Feedback? Help us improve.