resonance  by code-with-antonio

Open-source AI platform for advanced text-to-speech and voice cloning

Created 1 month ago
278 stars

Top 93.4% on SourcePulse

GitHubView on GitHub
Project Summary

Resonance offers an open-source, self-hostable alternative for AI-powered text-to-speech (TTS) and voice cloning, directly competing with commercial solutions like ElevenLabs. It targets developers and power users seeking a customizable, feature-rich platform for generating speech and cloning voices, providing significant control and potential cost savings over proprietary services.

How It Works

This project is built using Next.js 16 and React 19, integrating Chatterbox TTS for its core speech generation and zero-shot voice cloning capabilities. Voice cloning requires only a 10-second audio sample, eliminating the need for fine-tuning. Inference is handled via serverless GPUs on Modal, specifically NVIDIA A10G instances, ensuring scalability. Authentication and multi-tenancy are managed by Clerk Organizations, while usage-based billing, character metering, and voice creation pricing are implemented through Polar. Audio assets and voice reference files are stored in Cloudflare R2 buckets.

Quick Start & Requirements

  • Installation: Clone the repository (git clone), navigate into the directory (cd resonance), and install dependencies (npm install).
  • Prerequisites: Node.js (version 20.9 or later), Prisma, a PostgreSQL database, a Clerk account (with Organizations enabled), a Cloudflare R2 bucket, a Modal account, and a Polar account.
  • Setup: Requires configuring environment variables (.env), setting up meters and products in Polar for billing, deploying the Chatterbox TTS engine to Modal, migrating the database (npx prisma migrate deploy), and seeding built-in voices (npx prisma db seed).
  • Resources: A comprehensive 12-hour YouTube tutorial is available, covering all features from scratch. A one-click deploy option to Railway is also provided.

Highlighted Details

  • Zero-Shot Voice Cloning: Instantly clone voices from short audio samples without fine-tuning.
  • Advanced TTS: Generate speech with adjustable creativity, variety, expression, and flow parameters.
  • Built-in Voices: Includes 20 pre-seeded system voices across 12 categories and 5 locales.
  • Multi-Tenant Architecture: Supports team-based access and data isolation via Clerk Organizations.
  • Usage-Based Billing: Implements pay-as-you-go character metering and voice creation pricing via Polar.
  • Responsive UI: Features a mobile-first design with adaptive layouts and accessible controls.

Maintenance & Community

The project is maintained by code-with-antonio. No specific community channels (like Discord or Slack) or details on notable contributors or sponsorships are provided in the README.

Licensing & Compatibility

The license for this repository is not specified in the provided README text. This lack of explicit licensing information presents a significant adoption blocker, particularly for commercial use or integration into closed-source projects.

Limitations & Caveats

The serverless GPU inference on Modal may incur cold start latency for the first request after a period of inactivity. The setup process is complex, requiring integration and configuration of multiple external services (Clerk, Polar, Modal, R2, PostgreSQL). Crucially, the absence of a specified open-source license requires clarification before any adoption decision can be made.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
98 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.