Discover and explore top open-source AI tools and projects—updated daily.
zai-orgRobust speech recognition model for challenging audio
Top 50.5% on SourcePulse
Summary
GLM-ASR-Nano is an open-source automatic speech recognition (ASR) model designed to handle real-world audio complexities. It targets users needing robust transcription, especially for diverse dialects and low-volume speech, offering a competitive alternative to existing models like Whisper V3.
How It Works
This 1.5 billion parameter model employs a robust architecture optimized for challenging acoustic environments. Its design prioritizes exceptional dialect support, including Cantonese, and specialized training for low-volume or quiet speech scenarios. This approach aims to achieve state-of-the-art performance, particularly on Chinese benchmarks, by effectively capturing nuances often missed by conventional ASR systems.
Quick Start & Requirements
Installation involves pip install -r requirements.txt and sudo apt install ffmpeg. Inference can be performed using the transformers library (supporting 5.x), vLLM, or SGLang. Example inference scripts for English and Chinese audio are provided.
Highlighted Details
Maintenance & Community
The project mentions a WeChat community for engagement. No specific details on core maintainers, sponsorships, or roadmap are provided in the README.
Licensing & Compatibility
The README does not explicitly state the license type or any compatibility notes for commercial use.
Limitations & Caveats
The README does not detail specific limitations, alpha status, known bugs, or unsupported platforms. The focus on specific Chinese dialects and low-volume speech might imply limitations in broader language support or standard-volume speech scenarios compared to more general-purpose models.
1 week ago
Inactive
jonatasgrosman