GPA  by AutoArk

Unified audio model for speech tasks

Created 5 months ago
645 stars

Top 51.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary AutoArk/GPA offers a unified, single auto-regressive transformer model for Automatic Speech Recognition (ASR), Text-to-Speech (TTS), and Voice Conversion (VC). Aimed at researchers and developers, it provides near state-of-the-art performance across ASR and TTS within a compact, unified architecture, simplifying multi-task audio processing.

How It Works GPA employs a unified auto-regressive transformer architecture, treating speech understanding and generation as a single sequence-to-sequence problem. This allows one model to handle diverse audio tasks, reducing complexity. GPA-v1.5 is a larger, cleaner model with enhanced ASR and TTS capabilities, built on native PyTorch workflows.

Quick Start & Requirements Installation typically involves Hugging Face models and PyTorch. Deployment options include native PyTorch/Hugging Face inference, an ONNX Runtime for CLI, FastAPI services, and browser UIs. Key resources: Hugging Face Models (https://huggingface.co/AutoArk-AI/GPA-v1.5), ONNX Runtime Assets (https://huggingface.co/AutoArk-AI/GPA-v1.5-onnx-runtime), Inference Guide (GPA_1.5/docs/infer.md), ONNX Runtime Guide (GPA_1.5/onnx_runtime/README.md), Demo (https://autoark.github.io/GPA/). GPU acceleration is recommended.

Highlighted Details

  • GPA-v1.5 delivers near-SOTA ASR and TTS performance within a unified 0.6B parameter model.
  • ONNX Runtime enables cross-platform deployment via CLI, FastAPI, and browser UI.
  • GPA-TTS is a standalone, lightweight TTS runtime optimized for edge deployment with INT4/INT8 quantization and zero-shot voice cloning.
  • TTS decoders offer selectable precision (INT8, FP16, FP32).
  • Integrates with PyTorch, vLLM, llama-cpp, sglang, and mlx-lm.

Maintenance & Community Recent announcements in April 2026 indicate active development, particularly around GPA-v1.5 and its ONNX runtime. Hosted by AutoArk, primary hubs include Hugging Face and GitHub Pages, though direct community channels like Discord/Slack are not listed.

Licensing & Compatibility The provided README does not specify a software license, requiring clarification on usage rights and commercial compatibility.

Limitations & Caveats Native Voice Conversion support for GPA-v1.5 is under development. Key features like an interactive demo and basic service deployment (vLLM/FastAPI) for GPA-v1.5 are planned but not yet released. RKNN support is also pending. An archive exists for the older GPA-v1.0 release.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
540 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

moonshine by moonshine-ai

0.3%
8k
Speech-to-text models optimized for fast, accurate ASR on edge devices
Created 1 year ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
2 more.

metavoice-src by metavoiceio

0.0%
4k
TTS model for human-like, expressive speech
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.