speechgpt by hahahumble

Web app for conversing with ChatGPT via speech

Created 2 years ago

2,773 stars

Top 16.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Taranjeet Singh

Cofounder of Mem0

Project Summary

SpeechGPT is an open-source web application enabling users to converse with ChatGPT via voice, targeting language learners and general users seeking interactive AI experiences. It offers a privacy-first, mobile-friendly interface with extensive language support and flexible speech input/output options.

How It Works

The application leverages web technologies to provide a conversational interface with ChatGPT. It integrates both browser-based speech recognition and synthesis, alongside optional, more advanced services from Azure Speech Services and Amazon Polly for enhanced accuracy and naturalness. Data is processed and stored locally, prioritizing user privacy.

Quick Start & Requirements

Install/Run: docker run -d -p 8080:8080 --name speechgpt hahahumble/speechgpt
Prerequisites: OpenAI API Key. Optional: Azure Speech Services credentials (Region, Access Key) or Amazon Polly credentials (Region, Access Key ID, Secret Access Key with AmazonPollyFullAccess).
Access: Visit http://localhost:8080/.
Docs: Website, Development Guide, Changelog

Highlighted Details

Supports over 100 languages for both speech recognition and synthesis.
Offers choice between built-in and cloud-based (Azure, Polly) speech services.
Designed for mobile-friendliness and local data storage.
Open-source and free to use and modify.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The application requires an OpenAI API key, incurring costs based on usage. While optional cloud speech services are available, their setup involves managing cloud provider credentials and potential costs. The README does not detail specific performance benchmarks or known limitations of the built-in speech capabilities.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days