Search results

294 results for "conversational speech model"

25 of 294 repos

...
Repository
Description
Stars
Stars 7d Δ
Stars 7d %
PRs 7d Δ
Created
Response rate
Issues 30d
Last active

1

A curated list of speech enhancement, dereverberation, and speech separation resources (papers, code, tools). Covers traditional and neural ...
1k
Top 50%
3
0.3%
0
5y ago

Inactive

1y ago

2

whisperopenai
Starred by
bcherny:
karpathy:
swyxio:
taranjeet:
+17
General-purpose speech recognition model. It performs multilingual speech recognition, speech translation, and language identification.
86k
Top 1%
377
0.4%
0
2y ago

Inactive

1mo ago

3

csm-mlxsenstella
Text-to-speech model implemented in MLX, Apple's machine learning framework. It supports context input, quantization, and streaming.
367
1
0.3%
0
4mo ago

Inactive

2mo ago

4

Build a local voice assistant with speech-to-text (Whisper), LLM (Ollama, Llama-2), and text-to-speech (Bark). Supports voice-based interact...
530
5
0.9%
0
1y ago

1 week

2mo ago

5

Meta-voiceboxSpeechifyInc
Implementation of Voicebox, a text-guided multilingual speech generation model. It performs zero-shot TTS, noise removal, and style conversi...
583
0
0%
0
2y ago

Inactive

2y ago

6

ChatTTS2noise
Starred by
osanseviero:
Generative speech model for daily dialogue, optimized for conversational TTS. It supports multiple speakers and fine-grained prosodic contro...
37k
Top 1%
80
0.2%
0
1y ago

1 day

3w ago

7

Step-Audiostepfun-ai
Open-source framework for intelligent speech interaction. It supports multilingual conversations, voice cloning, and controllable speech syn...
4k
Top 25%
7
0.2%
1
5mo ago

1 day

1mo ago

8

Freeze-OmniVITA-MLLM
Speech-to-speech dialogue model built on a frozen LLM. It features chunk-wise streaming input, AR-based speech output, and state prediction....
334
1
0.3%
0
9mo ago

1 day

2mo ago

9

QuickAgentgkamradt
Voice bot demo using Text-To-Speech, Speech-To-Text, and a language model to have a conversation with a user. Utilizes streaming.
371
1
0.3%
0
1y ago

1 week

1y ago

10

Text-to-speech model trained on 250k hours of speech data. It uses a unified tokenizer for both speech (X-codec2) and text (LLaMA).
595
2
0.3%
0
6mo ago

1 week

3mo ago

11

csmSesameAILabs
Starred by
thomwolf:
dguido:
transitive-bullshit:
Speech generation model that generates RVQ audio codes from text and audio inputs. It employs a Llama backbone and an audio decoder.
14k
Top 5%
46
0.3%
0
5mo ago

1 week

2mo ago

12

SpeechGPT0nutation
Speech Large Language Models capable of perceiving and generating multi-modal content following multi-modal human instructions. Includes dat...
1k
Top 50%
2
0.1%
0
2y ago

1 day

1y ago

13

Voilamaitrix-org
Voice-language foundation models for real-time, low-latency voice interaction. It supports ASR, TTS, and voice translation across six langua...
429
1
0.2%
0
4mo ago

Inactive

2mo ago

14

phemePolyAI-LDN
Framework for efficient, conversational TTS model training and inference. Uses semantic/acoustic token separation and MaskGit-style parallel...
260
0
0%
0
1y ago

1 week

1y ago

15

End-to-end speech dialogue model trained on millions of hours of speech data. It features low-latency response and natural, human-like speec...
347
1
0.3%
0
6mo ago

Inactive

6mo ago

16

Voice interaction framework using SenceVoice ASR, QWen2.5 LLM, and TTS (CoosyVoice, pyttsx3, edgeTTS). Includes voiceprint recognition and K...
876
Top 50%
10
1.1%
0
8mo ago

1+ week

5mo ago

17

LLaMA-Omniictnlp
Starred by
jph00:
Speech-language model built upon Llama-3. It supports low-latency and high-quality speech interactions, generating both text and speech.
3k
Top 25%
3
0.1%
0
10mo ago

1 day

2mo ago

18

A ThreeJS-powered virtual human that uses Azure APIs for speech synthesis. Can be combined with a chat model for an interactive avatar.
369
2
0.6%
0
2y ago

Inactive

1mo ago

19

Discord chatbot using a fine-tuned conversational model. The model is trained on a character's lines and hosted on Hugging Face's Model Hub....
316
1
0.3%
0
4y ago

1 day

2y ago

20

Large-scale audio-language model for audio analysis and voice chat. It accepts audio inputs and performs audio analysis or textual responses...
2k
Top 25%
8
0.4%
0
1y ago

1 day

3mo ago

21

ZipVoicek2-fsa
Fast, high-quality zero-shot TTS with flow matching. Supports voice cloning, multi-lingual, and dialogue generation.
336
19
5.9%
1
1mo ago

Inactive

3d ago

22

smart-turnpipecat-ai
Open-source audio turn detection model. Uses Wav2Vec2-BERT to determine when a voice agent should respond to human speech. Supports English....
840
Top 50%
18
2.2%
0
4mo ago

1 day

1w ago

23

mini-omni2gpt-omni
Omni-interactive model that understands image, audio, and text inputs. Features real-time voice output and flexible interaction.
2k
Top 25%
4
0.2%
0
9mo ago

1 week

6mo ago

24

vitsjaywalnut310
Starred by
osanseviero:
End-to-end text-to-speech model using variational inference, normalizing flows, and adversarial training. Includes a stochastic duration pre...
8k
Top 10%
15
0.2%
0
4y ago

Inactive

1y ago

25

A curated list of speech and language processing papers from INTERSPEECH 2023 & 2024, covering ASR, speech synthesis, and more. Includes cod...
678
0
0%
0
2y ago

Inactive

7mo ago

25 of 294 repos

...
Feedback? Help us improve.