Discover and explore top open-source AI tools and projects—updated daily.
AutoArkUnified audio model for speech tasks
Top 51.2% on SourcePulse
Summary AutoArk/GPA offers a unified, single auto-regressive transformer model for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Voice Conversion (VC). Aimed at researchers and developers, it provides near state-of-the-art performance across ASR and TTS within a compact, unified architecture, simplifying multi-task audio processing.
How It Works GPA employs a unified auto-regressive transformer architecture, treating speech understanding and generation as a single sequence-to-sequence problem. This allows one model to handle diverse audio tasks, reducing complexity. GPA-v1.5 is a larger, cleaner model with enhanced ASR and TTS capabilities, built on native PyTorch workflows.
Quick Start & Requirements Installation typically involves Hugging Face models and PyTorch. Deployment options include native PyTorch/Hugging Face inference, an ONNX Runtime for CLI, FastAPI services, and browser UIs. Key resources: Hugging Face Models (https://huggingface.co/AutoArk-AI/GPA-v1.5), ONNX Runtime Assets (https://huggingface.co/AutoArk-AI/GPA-v1.5-onnx-runtime), Inference Guide (GPA_1.5/docs/infer.md), ONNX Runtime Guide (GPA_1.5/onnx_runtime/README.md), Demo (https://autoark.github.io/GPA/). GPU acceleration is recommended.
Highlighted Details
Maintenance & Community Recent announcements in April 2026 indicate active development, particularly around GPA-v1.5 and its ONNX runtime. Hosted by AutoArk, primary hubs include Hugging Face and GitHub Pages, though direct community channels like Discord/Slack are not listed.
Licensing & Compatibility The provided README does not specify a software license, requiring clarification on usage rights and commercial compatibility.
Limitations & Caveats Native Voice Conversion support for GPA-v1.5 is under development. Key features like an interactive demo and basic service deployment (vLLM/FastAPI) for GPA-v1.5 are planned but not yet released. RKNN support is also pending. An archive exists for the older GPA-v1.0 release.
2 weeks ago
Inactive
moonshine-ai
metavoiceio