Discover and explore top open-source AI tools and projects—updated daily.
dseditorLocal ASR tool for real-time transcription and subtitle generation
Top 96.1% on SourcePulse
Summary
This project provides a lightweight, local speech-to-text tool for generating subtitles from audio/video files and real-time microphone input. It targets both casual users needing a simple solution and power users seeking high accuracy via GPU acceleration, offering a free and accessible ASR experience.
How It Works
The tool employs Qwen3-ASR models, featuring a CPU-only mode with OpenVINO INT8 quantization and a GPU mode leveraging Vulkan via chatllm.cpp for enhanced accuracy with 1.7B GGUF models. It incorporates automatic voice activity detection (VAD), speaker diarization, multi-language support (30+), recognition hints, and an integrated subtitle editor.
Quick Start & Requirements
Portable EXE versions offer out-of-the-box functionality with automatic model downloads (~1.2 GB for CPU, ~2.3 GB for GPU). Source installation requires Python 3.10+ and git clone. Video processing relies on ffmpeg, which can be auto-downloaded. Minimum requirements are Windows 10/11 (64-bit), 6GB RAM for CPU, and a Vulkan 1.2+ compatible GPU with 8GB RAM for GPU mode.
Highlighted Details
chatllm.cpp supports NVIDIA, AMD, and Intel GPUs without CUDA/ROCm dependencies.ffmpeg installer simplify deployment.Maintenance & Community
Recent updates indicate active development, with features added throughout early 2024. No specific community channels or contributor details are provided.
Licensing & Compatibility
The core project code is MIT licensed. It integrates ffmpeg (GPL) for video processing and uses pre-compiled binaries from chatllm.cpp (MIT) for its Vulkan GPU backend. Model weights are subject to their respective source licenses. Compatibility is primarily for Windows 10/11 (64-bit).
Limitations & Caveats
The CPU-only mode offers "ordinary" recognition rates. Real-time transcription processes during pauses, not true streaming. Speaker diarization is less effective with simultaneous speech. The application path must not contain Chinese characters. CUDA support is in maintenance mode; Vulkan is the primary GPU focus.
1 month ago
Inactive