ququ by yan5xu

Privacy-first Chinese desktop voice workflow for intelligent text processing

Created 3 months ago

1,888 stars

Top 22.8% on SourcePulse

Project Summary

QuQu (蛐蛐) is an open-source, free, desktop-based voice workflow tool designed as an alternative to Wispr Flow, specifically optimized for Chinese users. It prioritizes user privacy through local data processing and offers advanced text manipulation capabilities by integrating local speech recognition models with configurable large language models. This makes it suitable for individuals seeking a cost-effective, privacy-conscious, and powerful voice input solution for content creation, coding, and communication.

How It Works

QuQu employs a unique "two-stage engine" workflow: first, highly accurate Automatic Speech Recognition (ASR) using Alibaba's FunASR Paraformer model runs locally on the user's machine, ensuring data privacy and understanding nuanced Chinese internet language. Second, a Large Language Model (LLM) optimizes the transcribed text, automatically filtering out filler words, correcting speech errors in real-time (e.g., resolving self-corrections), and formatting output based on user-defined instructions. This approach allows QuQu to not only transcribe but also "understand" and "reshape" spoken language into desired text formats.

Quick Start & Requirements

Primary Install/Run: pnpm run dev after setup.
Prerequisites: Node.js 18+, pnpm, Python 3.8+, macOS 10.15+, Windows 10+, or Linux.
Setup: Three installation schemes are provided: using uv (recommended for automatic Python/dependency management), system Python with virtual environments, or an embedded Python environment for isolation. The uv method involves git clone, pnpm install, uv sync, uv run python download_models.py, and pnpm run dev.
Configuration: Requires API Key, Base URL, and Model Name for chosen AI services (OpenAI compatible, Tongyi Qianwen, Kimi, etc.) configured within the application settings.
Links: Project repository: https://github.com/yan5xu/ququ

Highlighted Details

Local ASR: Integrates FunASR (Paraformer-large) for privacy-preserving, high-accuracy Chinese speech-to-text.
Intelligent Optimization: LLM-powered post-processing corrects errors, filters speech disfluencies, and formats text contextually.
Flexible LLM Support: Compatible with OpenAI API and optimized for domestic Chinese LLMs like Tongyi Qianwen and Kimi.
Developer-Friendly: Accurately recognizes programming terms (camelCase, snake_case) and supports context-aware output formatting via custom AI instructions.
User Experience: Features a global hotkey (F2) for instant activation and seamless pasting of transcribed text to the current cursor position.

Maintenance & Community

The project welcomes community contributions through GitHub Issues for suggestions and bug reports, and Pull Requests for code contributions. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

QuQu is licensed under the Apache License 2.0. This license is permissive and generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Initial setup requires installing both Node.js and Python environments, with multiple options potentially adding complexity for novice users. Downloading FunASR models may be slow or encounter network issues depending on the user's location. Some macOS users might need to manage SSL certificate issues by installing a specific urllib3 version. The project's primary focus on Chinese language and domestic models suggests potential limitations in support or performance for other languages.

ququ by yan5xu

Explore Similar Projects

leopard by Picovoice

amical by amicalhq

BiBi-Keyboard by BryceWG

FluidVoice by altic-dev

jarvis-ai-assistant by akshayaggarwal99

open-whispr by HeroTools

jarvis by llm-guy

ichigo by janhq

local-talking-llm by vndee

FireRedASR by FireRedTeam

tts by wangwangit

BELLE by LianjiaTech