Showing 1 - 25 of 29425 of 294 repos
Repository | Description | Stars | Stars 7d Δ | Stars 7d % | PRs 7d Δ | Created | Response rate | Issues 30d | Last active | |
---|---|---|---|---|---|---|---|---|---|---|
1 | awesome-speech-enhancementWenzheLiu-Speech | A curated list of speech enhancement, dereverberation, and speech separation resources (papers, code, tools). Covers traditional and neural ... | 1k Top 50% | 3 | 0.3% | 0 | 5y ago | Inactive | 1y ago | |
2 | General-purpose speech recognition model. It performs multilingual speech recognition, speech translation, and language identification.
| 86k Top 1% | 377 | 0.4% | 0 | 2y ago | Inactive | 1mo ago | ||
3 | csm-mlxsenstella | Text-to-speech model implemented in MLX, Apple's machine learning framework. It supports context input, quantization, and streaming. | 367 | 1 | 0.3% | 0 | 4mo ago | Inactive | 2mo ago | |
4 | local-talking-llmvndee | Build a local voice assistant with speech-to-text (Whisper), LLM (Ollama, Llama-2), and text-to-speech (Bark). Supports voice-based interact... | 530 | 5 | 0.9% | 0 | 1y ago | 1 week | 2mo ago | |
5 | Meta-voiceboxSpeechifyInc | Implementation of Voicebox, a text-guided multilingual speech generation model. It performs zero-shot TTS, noise removal, and style conversi... | 583 | 0 | 0% | 0 | 2y ago | Inactive | 2y ago | |
6 | Generative speech model for daily dialogue, optimized for conversational TTS. It supports multiple speakers and fine-grained prosodic contro... | 37k Top 1% | 80 | 0.2% | 0 | 1y ago | 1 day | 3w ago | ||
7 | Step-Audiostepfun-ai | Open-source framework for intelligent speech interaction. It supports multilingual conversations, voice cloning, and controllable speech syn... | 4k Top 25% | 7 | 0.2% | 1 | 5mo ago | 1 day | 1mo ago | |
8 | Freeze-OmniVITA-MLLM | Speech-to-speech dialogue model built on a frozen LLM. It features chunk-wise streaming input, AR-based speech output, and state prediction.... | 334 | 1 | 0.3% | 0 | 9mo ago | 1 day | 2mo ago | |
9 | QuickAgentgkamradt | Voice bot demo using Text-To-Speech, Speech-To-Text, and a language model to have a conversation with a user. Utilizes streaming.
| 371 | 1 | 0.3% | 0 | 1y ago | 1 week | 1y ago | |
10 | LLaSA_trainingzhenye234 | Text-to-speech model trained on 250k hours of speech data. It uses a unified tokenizer for both speech (X-codec2) and text (LLaMA).
| 595 | 2 | 0.3% | 0 | 6mo ago | 1 week | 3mo ago | |
11 | Speech generation model that generates RVQ audio codes from text and audio inputs. It employs a Llama backbone and an audio decoder.
| 14k Top 5% | 46 | 0.3% | 0 | 5mo ago | 1 week | 2mo ago | ||
12 | SpeechGPT0nutation | Speech Large Language Models capable of perceiving and generating multi-modal content following multi-modal human instructions. Includes dat... | 1k Top 50% | 2 | 0.1% | 0 | 2y ago | 1 day | 1y ago | |
13 | Voilamaitrix-org | Voice-language foundation models for real-time, low-latency voice interaction. It supports ASR, TTS, and voice translation across six langua... | 429 | 1 | 0.2% | 0 | 4mo ago | Inactive | 2mo ago | |
14 | phemePolyAI-LDN | Framework for efficient, conversational TTS model training and inference. Uses semantic/acoustic token separation and MaskGit-style parallel... | 260 | 0 | 0% | 0 | 1y ago | 1 week | 1y ago | |
15 | SpeechGPT-2.0-previewOpenMOSS | End-to-end speech dialogue model trained on millions of hours of speech data. It features low-latency response and natural, human-like speec... | 347 | 1 | 0.3% | 0 | 6mo ago | Inactive | 6mo ago | |
16 | ASR-LLM-TTSABexit | Voice interaction framework using SenceVoice ASR, QWen2.5 LLM, and TTS (CoosyVoice, pyttsx3, edgeTTS). Includes voiceprint recognition and K... | 876 Top 50% | 10 | 1.1% | 0 | 8mo ago | 1+ week | 5mo ago | |
17 | Speech-language model built upon Llama-3. It supports low-latency and high-quality speech interactions, generating both text and speech.
| 3k Top 25% | 3 | 0.1% | 0 | 10mo ago | 1 day | 2mo ago | ||
18 | talking_avatarbornfree | A ThreeJS-powered virtual human that uses Azure APIs for speech synthesis. Can be combined with a chat model for an interactive avatar.
| 369 | 2 | 0.6% | 0 | 2y ago | Inactive | 1mo ago | |
19 | twewy-discord-chatbotRuolinZheng08 | Discord chatbot using a fine-tuned conversational model. The model is trained on a character's lines and hosted on Hugging Face's Model Hub.... | 316 | 1 | 0.3% | 0 | 4y ago | 1 day | 2y ago | |
20 | Qwen2-AudioQwenLM | Large-scale audio-language model for audio analysis and voice chat. It accepts audio inputs and performs audio analysis or textual responses... | 2k Top 25% | 8 | 0.4% | 0 | 1y ago | 1 day | 3mo ago | |
21 | ZipVoicek2-fsa | Fast, high-quality zero-shot TTS with flow matching. Supports voice cloning, multi-lingual, and dialogue generation. | 336 | 19 | 5.9% | 1 | 1mo ago | Inactive | 3d ago | |
22 | smart-turnpipecat-ai | Open-source audio turn detection model. Uses Wav2Vec2-BERT to determine when a voice agent should respond to human speech. Supports English.... | 840 Top 50% | 18 | 2.2% | 0 | 4mo ago | 1 day | 1w ago | |
23 | mini-omni2gpt-omni | Omni-interactive model that understands image, audio, and text inputs. Features real-time voice output and flexible interaction.
| 2k Top 25% | 4 | 0.2% | 0 | 9mo ago | 1 week | 6mo ago | |
24 | End-to-end text-to-speech model using variational inference, normalizing flows, and adversarial training. Includes a stochastic duration pre... | 8k Top 10% | 15 | 0.2% | 0 | 4y ago | Inactive | 1y ago | ||
25 | INTERSPEECH-2023-24-PapersDmitryRyumin | A curated list of speech and language processing papers from INTERSPEECH 2023 & 2024, covering ASR, speech synthesis, and more. Includes cod... | 678 | 0 | 0% | 0 | 2y ago | Inactive | 7mo ago |
Showing 1 - 25 of 29425 of 294 repos