Speech package for local, real-time voice AI development
Top 19.9% on sourcepulse
Ichigo is a Python package providing local, real-time speech AI capabilities for developers, focusing on Automatic Speech Recognition (ASR) and experimental Speech Language Modeling (LLM). It aims to simplify speech tasks by offering intuitive Python interfaces and a scalable FastAPI service, abstracting away complex audio processing.
How It Works
Ichigo-ASR is a compact (22M parameters) speech tokenizer based on Whisper-medium, designed for efficient multilingual performance. It converts speech into discrete tokens, enhancing compatibility with LLMs for direct speech understanding. This approach, inspired by early fusion techniques, allows for modularity and potential cross-task training, enabling ASR data to inform TTS models and vice-versa.
Quick Start & Requirements
pip install ichigo
uvicorn asr:app --host 0.0.0.0 --port 8000
or via Docker.http://localhost:8000/docs
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 months ago
1 week