aura-voice  by ntegrals

In-browser voice assistant for low-latency interaction

Created 1 year ago
1,241 stars

Top 31.8% on SourcePulse

GitHubView on GitHub
Project Summary

Aura is a browser-based AI voice assistant designed for low-latency interactions, targeting users who want a web-native alternative to existing voice assistants. It leverages cutting-edge AI services to provide a responsive and natural conversational experience directly within the browser.

How It Works

Aura integrates Vercel Edge Functions for rapid request handling, Whisper for accurate speech-to-text transcription, GPT-4o Mini for natural language understanding and response generation, and Eleven Labs for high-quality, low-latency text-to-speech streaming. This combination aims to minimize the round-trip time for voice commands, making web-based voice interaction feel more immediate.

Quick Start & Requirements

  • Install dependencies: npm install
  • Run the app: npm run dev
  • Prerequisites: Node.js, OpenAI API Key, Eleven Labs API Key, and Eleven Labs Voice ID.
  • Setup involves cloning the repository, configuring API keys in .env.local, and installing Node.js dependencies.
  • Demo available at: https://voice.julianschoen.co

Highlighted Details

  • Optimized for low latency using Vercel Edge Functions.
  • Integrates Whisper, GPT-4o Mini, and Eleven Labs TTS streaming.
  • Aims to replicate Siri-like functionality in a web browser.

Maintenance & Community

The project is maintained by Julian Schoen (@julianschoen) on Twitter. Contact is available via email (j.schoen@mail.com) for discussions, mentorship, or hiring. A "Buy Me A Coffee" link is provided for support.

Licensing & Compatibility

Distributed under the MIT License. This license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The project is described as an "experimental application." Users are responsible for managing OpenAI API token usage and associated costs, as GPT-4o Mini can be expensive. The current implementation may not yet feature advanced latency mitigation strategies like response splitting.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

RealtimeSTT by KoljaB

0.5%
9k
Speech-to-text library for realtime applications
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.