edgen  by edgenai

Local GenAI server for private, offline AI

created 1 year ago
364 stars

Top 78.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Edgen provides a local, private GenAI server that acts as a drop-in replacement for OpenAI's API, targeting developers and users who need on-device inference for LLMs, speech-to-text, and other AI models. It offers data privacy, reliability, and cost savings by leveraging user hardware, eliminating the need for cloud infrastructure and API keys.

How It Works

Built in Rust, Edgen is natively compiled for Windows, macOS, and Linux, abstracting the complexities of AI model optimization across different hardware and platforms. It supports model caching to avoid redundant downloads and offers modularity for easy integration of new models and runtimes. The server implements an OpenAI-compatible API, exposing endpoints for chat completions, audio transcriptions, and embeddings, with plans for image generation and multimodal chat.

Quick Start & Requirements

  • Install/Run: Download and run the binary. The serve command starts the server (default behavior).
  • Prerequisites: No Docker required. Optional GPU support via Vulkan, CUDA, or Metal requires respective SDKs/Toolkits.
  • Resources: No specific resource footprint mentioned, but designed for local hardware.
  • Links: Documentation, Blog, Discord, Roadmap, EdgenChat

Highlighted Details

  • OpenAI-compatible API for seamless integration.
  • Supports LLMs (Llama2, Mistral, Mixtral) and Whisper speech-to-text.
  • Native compilation for Windows, macOS, and Linux; no Docker needed.
  • Optional GPU acceleration via Vulkan, CUDA, and Metal.

Maintenance & Community

  • Active development with a public roadmap.
  • Community support via Discord and GitHub discussions/issues.
  • Acknowledges contributions from llama.cpp, whisper.cpp, and ggml.

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

Image generation and multimodal chat completions are listed as future features. The README notes that Vulkan, CUDA, and Metal GPU features cannot be enabled simultaneously.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Andre Zayarni Andre Zayarni(Cofounder of Qdrant), and
2 more.

RealChar by Shaunwei

0.1%
6k
Real-time AI character/companion creation and interaction codebase
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

SillyTavern by SillyTavern

3.2%
17k
LLM frontend for power users
created 2 years ago
updated 3 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 13 hours ago
Feedback? Help us improve.