ai-devices by developersdigest

AI device template for voice assistant

Created 1 year ago

294 stars

Top 90.1% on SourcePulse

Project Summary

This project provides a template for building AI-powered voice assistants, inspired by devices like the Humane AI Pin and Rabbit R1. It targets developers and power users looking to create custom AI experiences with voice input, text-to-speech, image processing, and function calling capabilities, leveraging a variety of leading AI models.

How It Works

The assistant integrates multiple AI services for its core functionalities. Voice input and transcription are handled by OpenAI's Whisper or Groq's Whisper models. Text-to-speech output utilizes OpenAI's TTS models. Image processing can be done via OpenAI's GPT-4 Vision or Fal.ai's Llava-Next. Function calling and dynamic UI rendering are managed by OpenAI's GPT-3.5-Turbo. Configuration is centralized in app/config.tsx, allowing users to select providers and models for each feature, and optionally enable features like rate limiting (Upstash) and Langchain tracing.

Quick Start & Requirements

Install dependencies: npm install or bun install
Start development server: npm run dev or bun dev
Requires API keys for Groq, OpenAI, and optionally Serper, Upstash, Spotify, and Fal.AI.
Access the application at http://localhost:3000.
Configuration details: app/config.tsx

Highlighted Details

Supports multiple LLM providers including Groq (Llama3) and OpenAI (GPT-4o, GPT-3.5-Turbo).
Integrates OpenAI's Whisper for transcription and TTS models for speech output.
Offers vision inference via OpenAI's GPT-4 Vision or Fal.ai's Llava-Next.
Includes optional rate limiting with Upstash and tracing with LangSmith.
Features customizable UI toggles for response times, TTS, internet results, and photo uploads.

Maintenance & Community

The project is maintained by the developer behind Developers Digest. Support options include Patreon and Buy Me A Coffee. Links to the developer's website, GitHub, and Twitter are provided for engagement and updates.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

Text-to-speech and function calling currently only support OpenAI providers. The project is inspired by commercial AI devices but is a developer template, requiring significant setup and API key management.

ai-devices by developersdigest

Explore Similar Projects

FluidVoice by altic-dev

S.A.T.U.R.D.A.Y by GRVYDEV

gpt-voice-conversation-chatbot by Adri6336

ollama-voice-mac by apeatling

speak-gpt by AndraxDev

AIUI by lspahija

10x by 0xCrunchyy

Babagaboosh by DougDougGithub

swift by ai-ng

Bing-GPT-Voice-Assistant by Ai-Austin

Speech-AI-Forge by lenML

Verbi by PromtEngineer