nexa-sdk  by NexaAI

Nexa SDK: local inference framework for GGML/ONNX models

created 11 months ago
4,632 stars

Top 10.8% on sourcepulse

GitHubView on GitHub
Project Summary

Nexa SDK is a versatile, local-first inference framework designed for developers and researchers working with GGML and ONNX models. It supports a broad spectrum of AI tasks including text generation, image generation, vision-language models (VLM), audio-language models, automatic speech recognition (ASR), and text-to-speech (TTS), enabling on-device deployment across various platforms.

How It Works

Nexa SDK leverages the GGML tensor library for efficient CPU and GPU (CUDA, Metal, ROCm, Vulkan, SYCL) inference, and also supports ONNX models. It provides an OpenAI-compatible server, a local Streamlit UI for interactive model testing, and bindings for Android (Kotlin) and iOS (Swift), facilitating cross-platform development and deployment. The framework includes tools for model conversion and quantization, simplifying the process of preparing models for local inference.

Quick Start & Requirements

  • Installation:
    • Executable Installer: curl -fsSL https://public-storage.nexa4ai.com/install.sh | sh
    • Python Package (CPU): pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir
    • Python Package (CUDA 12.0+): CMAKE_ARGS="-DGGML_CUDA=ON" pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
    • Additional features (ONNX, eval, convert, TTS) can be installed via pip install "nexaai[feature]".
  • Prerequisites: CUDA Toolkit 12.0+ for CUDA support, ROCm 6.2.1+ for AMD GPU support, Vulkan SDK 1.3.261.1+ for Vulkan support, Intel GPU drivers and oneAPI for SYCL support.
  • Documentation: Documentation

Highlighted Details

  • Supports a wide range of AI modalities: text, image, VLM, audio, ASR, TTS.
  • Offers GPU acceleration for CUDA, Metal, ROCm, Vulkan, and SYCL.
  • Includes an OpenAI-compatible server and a local Streamlit UI.
  • Provides mobile bindings for Android (Kotlin) and iOS (Swift).
  • Features a benchmark system claimed to be 50x faster than lm-eval-harness for GGUF models.

Maintenance & Community

  • Weekly releases are indicated.
  • Community support via Discord.
  • Project updates shared on X (Twitter).

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project's licensing is not clearly stated in the README, which may impact commercial adoption.
  • Specific installation requirements for different GPU backends (e.g., SYCL on Windows) are detailed, suggesting potential complexity in setup.
Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
93
Star History
123 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.