10x by 0xCrunchyy

In-browser voice assistant for low-latency interaction

Created 2 years ago

1,387 stars

Top 28.8% on SourcePulse

Project Summary

Aura is a browser-based AI voice assistant designed for low-latency interactions, targeting users who want a web-native alternative to existing voice assistants. It leverages cutting-edge AI services to provide a responsive and natural conversational experience directly within the browser.

How It Works

Aura integrates Vercel Edge Functions for rapid request handling, Whisper for accurate speech-to-text transcription, GPT-4o Mini for natural language understanding and response generation, and Eleven Labs for high-quality, low-latency text-to-speech streaming. This combination aims to minimize the round-trip time for voice commands, making web-based voice interaction feel more immediate.

Quick Start & Requirements

Install dependencies: npm install
Run the app: npm run dev
Prerequisites: Node.js, OpenAI API Key, Eleven Labs API Key, and Eleven Labs Voice ID.
Setup involves cloning the repository, configuring API keys in .env.local, and installing Node.js dependencies.
Demo available at: https://voice.julianschoen.co

Highlighted Details

Optimized for low latency using Vercel Edge Functions.
Integrates Whisper, GPT-4o Mini, and Eleven Labs TTS streaming.
Aims to replicate Siri-like functionality in a web browser.

Maintenance & Community

The project is maintained by Julian Schoen (@julianschoen) on Twitter. Contact is available via email (j.schoen@mail.com) for discussions, mentorship, or hiring. A "Buy Me A Coffee" link is provided for support.

Licensing & Compatibility

Distributed under the MIT License. This license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The project is described as an "experimental application." Users are responsible for managing OpenAI API token usage and associated costs, as GPT-4o Mini can be expensive. The current implementation may not yet feature advanced latency mitigation strategies like response splitting.

10x by 0xCrunchyy

Explore Similar Projects

local_llm_assistant by nickbild

LLMVoX by mbzuai-oryx

S.A.T.U.R.D.A.Y by GRVYDEV

AIVoiceChat by KoljaB

Open-VoiceCanvas by ItusiAI

dia2 by nari-labs

fast-voice-assistant by dsa

swift by ai-ng

QuickAgent by gkamradt

mini-omni by gpt-omni

Orpheus-TTS by canopyai

RealtimeSTT by KoljaB