bark  by suno-ai

Generative audio model for realistic speech and sound effects

created 2 years ago
38,289 stars

Top 0.8% on sourcepulse

GitHubView on GitHub
Project Summary

Bark is a text-prompted generative audio model that produces realistic, multilingual speech, music, and sound effects. It's designed for researchers and power users seeking a flexible audio generation tool beyond traditional text-to-speech. Bark offers creative control and can generate non-speech sounds like laughter and sighs, with a focus on realistic voice and prosody.

How It Works

Bark is a transformer-based, fully generative text-to-audio model, similar to AudioLM and Vall-E. It uses a quantized audio representation from EnCodec. Unlike conventional TTS, Bark converts text directly to audio without intermediate phonemes, enabling generalization to music, sound effects, and non-speech sounds. It supports voice presets for tone, pitch, and emotion matching, but not custom voice cloning.

Quick Start & Requirements

Highlighted Details

  • Supports 100+ speaker presets across multiple languages.
  • Can generate music, background noise, and sound effects alongside speech.
  • Offers automatic language detection and accent generation for code-switched text.
  • Long-form generation capabilities are documented in provided notebooks.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use.

Limitations & Caveats

Bark is a research model and may deviate unexpectedly from prompts, producing higher variance outputs than traditional TTS. Generations are typically limited to ~13-14 seconds due to its GPT-style architecture. Audio quality can vary, sometimes resembling older phone calls.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
855 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

metavoice-src by metavoiceio

0%
4k
TTS model for human-like, expressive speech
created 1 year ago
updated 1 year ago
Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 1 day ago
Feedback? Help us improve.