aha by jhqxxx

Local AI inference engine for multimodal tasks

Created 9 months ago

382 stars

Top 74.3% on SourcePulse

Project Summary

Summary

Aha is a high-performance, cross-platform AI inference engine built with Rust and the Candle framework. It enables users to run state-of-the-art text, vision, speech, and OCR models locally, eliminating the need for API keys or cloud dependencies. This offers a fast, private, and efficient solution for deploying diverse AI capabilities directly on user hardware.

How It Works

Aha utilizes Rust's memory safety and Candle's efficient tensor computation for its core inference engine. This architecture facilitates cross-platform compatibility (Linux, macOS, Windows) and a local-first processing model. Performance is further enhanced through optional GPU acceleration via CUDA or Metal, and optimized long-sequence handling with Flash Attention.

Quick Start & Requirements

Installation: Clone the repository and build using Cargo: git clone https://github.com/jhqxxx/aha.git && cd aha && cargo build --release.
Prerequisites: Rust toolchain. Optional: NVIDIA drivers/CUDA toolkit for cuda feature, Metal support for macOS.
Features: Build with features like cuda, metal, flash-attn, ffmpeg for specific hardware acceleration or multimedia processing.
CLI: A command-line interface supports listing, downloading, running inference, and starting a local service with an OpenAI-compatible API (aha list, aha download, aha run, aha serv).
Docs: Links to detailed guides for installation, CLI, API, and supported models are available within the README.

Highlighted Details

Broad Model Support: Integrates numerous models across text (Qwen, MiniCPM), vision (Qwen-VL), OCR (DeepSeek-OCR, PaddleOCR-VL), and speech (VoxCPM, Fun-ASR).
Local-First & Privacy: All AI processing occurs on the user's machine, ensuring data privacy and offline capability.
GPU Acceleration: Leverages CUDA for NVIDIA GPUs and Metal for Apple Silicon for significantly faster inference.
Optimized Inference: Includes optional support for Flash Attention to improve performance on long input sequences.

Maintenance & Community

The project shows active development with frequent updates to model support and features, as evidenced by its recent changelog. Contributions are welcomed, though specific community channels (like Discord/Slack) or major sponsorships are not detailed in the provided text.

Licensing & Compatibility

Licensed under the permissive Apache-2.0 license. This license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The Qwen3.5 4B model is noted to have ongoing issues requiring resolution. As a Rust-based project utilizing the Candle framework, adoption may require familiarity with these specific technologies.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days