nexa-sdk by NexaAI

Nexa SDK: local inference framework for GGML/ONNX models

Created 1 year ago

7,419 stars

Top 6.9% on SourcePulse

3 Experts Love This Project

zhiyuan8

Cofounder of Nexa AI

alexchen4ai

Cofounder of Nexa AI

dynamicwebpaige

DevRel Lead at Google DeepMind

Project Summary

Nexa SDK is a versatile, local-first inference framework designed for developers and researchers working with GGML and ONNX models. It supports a broad spectrum of AI tasks including text generation, image generation, vision-language models (VLM), audio-language models, automatic speech recognition (ASR), and text-to-speech (TTS), enabling on-device deployment across various platforms.

How It Works

Nexa SDK leverages the GGML tensor library for efficient CPU and GPU (CUDA, Metal, ROCm, Vulkan, SYCL) inference, and also supports ONNX models. It provides an OpenAI-compatible server, a local Streamlit UI for interactive model testing, and bindings for Android (Kotlin) and iOS (Swift), facilitating cross-platform development and deployment. The framework includes tools for model conversion and quantization, simplifying the process of preparing models for local inference.

Quick Start & Requirements

Installation:
- Executable Installer: curl -fsSL https://public-storage.nexa4ai.com/install.sh | sh
- Python Package (CPU): pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cpu --extra-index-url https://pypi.org/simple --no-cache-dir
- Python Package (CUDA 12.0+): CMAKE_ARGS="-DGGML_CUDA=ON" pip install nexaai --prefer-binary --index-url https://github.nexa.ai/whl/cu124 --extra-index-url https://pypi.org/simple --no-cache-dir
- Additional features (ONNX, eval, convert, TTS) can be installed via pip install "nexaai[feature]".
Prerequisites: CUDA Toolkit 12.0+ for CUDA support, ROCm 6.2.1+ for AMD GPU support, Vulkan SDK 1.3.261.1+ for Vulkan support, Intel GPU drivers and oneAPI for SYCL support.
Documentation: Documentation

Highlighted Details

Supports a wide range of AI modalities: text, image, VLM, audio, ASR, TTS.
Offers GPU acceleration for CUDA, Metal, ROCm, Vulkan, and SYCL.
Includes an OpenAI-compatible server and a local Streamlit UI.
Provides mobile bindings for Android (Kotlin) and iOS (Swift).
Features a benchmark system claimed to be 50x faster than lm-eval-harness for GGUF models.

Maintenance & Community

Weekly releases are indicated.
Community support via Discord.
Project updates shared on X (Twitter).

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's licensing is not clearly stated in the README, which may impact commercial adoption.
Specific installation requirements for different GPU backends (e.g., SYCL on Windows) are detailed, suggesting potential complexity in setup.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

43

Issues (30d)

37

Star History

1,406 stars in the last 30 days

Explore Similar Projects

omnitool by omnitool-ai

Open-source "AI Lab in a box" desktop app for generative AI model interaction

Created 2 years ago

Updated 1 year ago

simpleAI by lhenault

Self-hosted API for LLM experimentation, mimicking OpenAI endpoints

Created 2 years ago

Updated 1 year ago

Starred by

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

1 more.

ollama-ai-provider by sgomez

Vercel AI Provider for local LLM inference via Ollama

Created 1 year ago

Updated 11 months ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

Anemll by Anemll

Framework for porting LLMs to Apple Neural Engine (ANE)

Created 1 year ago

Updated 1 week ago

Qwen2API by Rfym21

API gateway for Qwen large language models

Created 10 months ago

Updated 4 weeks ago

Starred by

Shawn Wang

Shawn Wang(Editor of Latent Space).

react-native-ai by dabit3

Full-stack framework for cross-platform mobile AI app development

Created 2 years ago

Updated 1 year ago

model_server by openvinotoolkit

Scalable inference server for OpenVINO-optimized models

Created 7 years ago

Updated 1 day ago

ailia-models by ailia-ai

AI model zoo for ailia SDK (cross-platform inference)

Created 6 years ago

Updated 23 hours ago

DeepClaude by ErlichLiu

OpenAI-compatible API for custom AI model combos

Created 11 months ago

Updated 3 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

2 more.

inference by xorbitsai

Model serving library for language, speech, and multimodal models

Created 2 years ago

Updated 1 day ago

Starred by

Clement Delangue

Clement Delangue(Cofounder of Hugging Face),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

15 more.

transformers.js by huggingface

Run Transformers models directly in your browser

Created 2 years ago

Updated 2 days ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

17 more.

LocalAI by mudler

Open-source OpenAI alternative for local AI inference

Created 2 years ago

Updated 14 hours ago

Feedback? Help us improve.