Nexa SDK: local inference framework for GGML/ONNX models
Top 10.8% on sourcepulse
Nexa SDK is a versatile, local-first inference framework designed for developers and researchers working with GGML and ONNX models. It supports a broad spectrum of AI tasks including text generation, image generation, vision-language models (VLM), audio-language models, automatic speech recognition (ASR), and text-to-speech (TTS), enabling on-device deployment across various platforms.
How It Works
Nexa SDK leverages the GGML tensor library for efficient CPU and GPU (CUDA, Metal, ROCm, Vulkan, SYCL) inference, and also supports ONNX models. It provides an OpenAI-compatible server, a local Streamlit UI for interactive model testing, and bindings for Android (Kotlin) and iOS (Swift), facilitating cross-platform development and deployment. The framework includes tools for model conversion and quantization, simplifying the process of preparing models for local inference.
Quick Start & Requirements
curl -fsSL https://public-storage.nexa4ai.com/install.sh | sh
pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir
CMAKE_ARGS="-DGGML_CUDA=ON" pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
pip install "nexaai[feature]"
.Highlighted Details
lm-eval-harness
for GGUF models.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 days ago
1 day