speechgpt  by hahahumble

Web app for conversing with ChatGPT via speech

created 2 years ago
2,764 stars

Top 17.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

SpeechGPT is an open-source web application enabling users to converse with ChatGPT via voice, targeting language learners and general users seeking interactive AI experiences. It offers a privacy-first, mobile-friendly interface with extensive language support and flexible speech input/output options.

How It Works

The application leverages web technologies to provide a conversational interface with ChatGPT. It integrates both browser-based speech recognition and synthesis, alongside optional, more advanced services from Azure Speech Services and Amazon Polly for enhanced accuracy and naturalness. Data is processed and stored locally, prioritizing user privacy.

Quick Start & Requirements

  • Install/Run: docker run -d -p 8080:8080 --name speechgpt hahahumble/speechgpt
  • Prerequisites: OpenAI API Key. Optional: Azure Speech Services credentials (Region, Access Key) or Amazon Polly credentials (Region, Access Key ID, Secret Access Key with AmazonPollyFullAccess).
  • Access: Visit http://localhost:8080/.
  • Docs: Website, Development Guide, Changelog

Highlighted Details

  • Supports over 100 languages for both speech recognition and synthesis.
  • Offers choice between built-in and cloud-based (Azure, Polly) speech services.
  • Designed for mobile-friendliness and local data storage.
  • Open-source and free to use and modify.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration with closed-source applications.

Limitations & Caveats

The application requires an OpenAI API key, incurring costs based on usage. While optional cloud speech services are available, their setup involves managing cloud provider credentials and potential costs. The README does not detail specific performance benchmarks or known limitations of the built-in speech capabilities.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.