aha  by jhqxxx

Local AI inference engine for multimodal tasks

Created 6 months ago
335 stars

Top 82.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Aha is a high-performance, cross-platform AI inference engine built with Rust and the Candle framework. It enables users to run state-of-the-art text, vision, speech, and OCR models locally, eliminating the need for API keys or cloud dependencies. This offers a fast, private, and efficient solution for deploying diverse AI capabilities directly on user hardware.

How It Works

Aha utilizes Rust's memory safety and Candle's efficient tensor computation for its core inference engine. This architecture facilitates cross-platform compatibility (Linux, macOS, Windows) and a local-first processing model. Performance is further enhanced through optional GPU acceleration via CUDA or Metal, and optimized long-sequence handling with Flash Attention.

Quick Start & Requirements

  • Installation: Clone the repository and build using Cargo: git clone https://github.com/jhqxxx/aha.git && cd aha && cargo build --release.
  • Prerequisites: Rust toolchain. Optional: NVIDIA drivers/CUDA toolkit for cuda feature, Metal support for macOS.
  • Features: Build with features like cuda, metal, flash-attn, ffmpeg for specific hardware acceleration or multimedia processing.
  • CLI: A command-line interface supports listing, downloading, running inference, and starting a local service with an OpenAI-compatible API (aha list, aha download, aha run, aha serv).
  • Docs: Links to detailed guides for installation, CLI, API, and supported models are available within the README.

Highlighted Details

  • Broad Model Support: Integrates numerous models across text (Qwen, MiniCPM), vision (Qwen-VL), OCR (DeepSeek-OCR, PaddleOCR-VL), and speech (VoxCPM, Fun-ASR).
  • Local-First & Privacy: All AI processing occurs on the user's machine, ensuring data privacy and offline capability.
  • GPU Acceleration: Leverages CUDA for NVIDIA GPUs and Metal for Apple Silicon for significantly faster inference.
  • Optimized Inference: Includes optional support for Flash Attention to improve performance on long input sequences.

Maintenance & Community

The project shows active development with frequent updates to model support and features, as evidenced by its recent changelog. Contributions are welcomed, though specific community channels (like Discord/Slack) or major sponsorships are not detailed in the provided text.

Licensing & Compatibility

Licensed under the permissive Apache-2.0 license. This license permits commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

The Qwen3.5 4B model is noted to have ongoing issues requiring resolution. As a Rust-based project utilizing the Candle framework, adoption may require familiarity with these specific technologies.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
102 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.