GPA by AutoArk

Unified audio model for speech tasks

Created 7 months ago

1,533 stars

Top 26.2% on SourcePulse

Project Summary

Summary AutoArk/GPA offers a unified, single auto-regressive transformer model for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Voice Conversion (VC). Aimed at researchers and developers, it provides near state-of-the-art performance across ASR and TTS within a compact, unified architecture, simplifying multi-task audio processing.

How It Works GPA employs a unified auto-regressive transformer architecture, treating speech understanding and generation as a single sequence-to-sequence problem. This allows one model to handle diverse audio tasks, reducing complexity. GPA-v1.5 is a larger, cleaner model with enhanced ASR and TTS capabilities, built on native PyTorch workflows.

Quick Start & Requirements Installation typically involves Hugging Face models and PyTorch. Deployment options include native PyTorch/Hugging Face inference, an ONNX Runtime for CLI, FastAPI services, and browser UIs. Key resources: Hugging Face Models (https://huggingface.co/AutoArk-AI/GPA-v1.5), ONNX Runtime Assets (https://huggingface.co/AutoArk-AI/GPA-v1.5-onnx-runtime), Inference Guide (GPA_1.5/docs/infer.md), ONNX Runtime Guide (GPA_1.5/onnx_runtime/README.md), Demo (https://autoark.github.io/GPA/). GPU acceleration is recommended.

Highlighted Details

GPA-v1.5 delivers near-SOTA ASR and TTS performance within a unified 0.6B parameter model.
ONNX Runtime enables cross-platform deployment via CLI, FastAPI, and browser UI.
GPA-TTS is a standalone, lightweight TTS runtime optimized for edge deployment with INT4/INT8 quantization and zero-shot voice cloning.
TTS decoders offer selectable precision (INT8, FP16, FP32).
Integrates with PyTorch, vLLM, llama-cpp, sglang, and mlx-lm.

Maintenance & Community Recent announcements in April 2026 indicate active development, particularly around GPA-v1.5 and its ONNX runtime. Hosted by AutoArk, primary hubs include Hugging Face and GitHub Pages, though direct community channels like Discord/Slack are not listed.

Licensing & Compatibility The provided README does not specify a software license, requiring clarification on usage rights and commercial compatibility.

Limitations & Caveats Native Voice Conversion support for GPA-v1.5 is under development. Key features like an interactive demo and basic service deployment (vLLM/FastAPI) for GPA-v1.5 are planned but not yet released. RKNN support is also pending. An archive exists for the older GPA-v1.0 release.

GPA by AutoArk

Explore Similar Projects

LinaCodec by ysharma3501

onnx-asr by istupakov

VoiceStar by jasonppy

VITA-Audio by VITA-MLLM

f5-tts-mlx by lucasnewman

GLM-ASR by zai-org

TensorflowASR by Z-yq

HierSpeechpp by sh-lee-prml

faster-qwen3-tts by andimarafioti

moonshine by moonshine-ai

metavoice-src by metavoiceio

Spark-TTS by SparkAudio