bark  by suno-ai

Generative audio model for realistic speech and sound effects

Created 2 years ago
38,507 stars

Top 0.8% on SourcePulse

GitHubView on GitHub
Project Summary

Bark is a text-prompted generative audio model that produces realistic, multilingual speech, music, and sound effects. It's designed for researchers and power users seeking a flexible audio generation tool beyond traditional text-to-speech. Bark offers creative control and can generate non-speech sounds like laughter and sighs, with a focus on realistic voice and prosody.

How It Works

Bark is a transformer-based, fully generative text-to-audio model, similar to AudioLM and Vall-E. It uses a quantized audio representation from EnCodec. Unlike conventional TTS, Bark converts text directly to audio without intermediate phonemes, enabling generalization to music, sound effects, and non-speech sounds. It supports voice presets for tone, pitch, and emotion matching, but not custom voice cloning.

Quick Start & Requirements

Highlighted Details

  • Supports 100+ speaker presets across multiple languages.
  • Can generate music, background noise, and sound effects alongside speech.
  • Offers automatic language detection and accent generation for code-switched text.
  • Long-form generation capabilities are documented in provided notebooks.

Maintenance & Community

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use.

Limitations & Caveats

Bark is a research model and may deviate unexpectedly from prompts, producing higher variance outputs than traditional TTS. Generations are typically limited to ~13-14 seconds due to its GPT-style architecture. Audio quality can vary, sometimes resembling older phone calls.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
174 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.