ai-devices  by developersdigest

AI device template for voice assistant

created 1 year ago
293 stars

Top 91.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a template for building AI-powered voice assistants, inspired by devices like the Humane AI Pin and Rabbit R1. It targets developers and power users looking to create custom AI experiences with voice input, text-to-speech, image processing, and function calling capabilities, leveraging a variety of leading AI models.

How It Works

The assistant integrates multiple AI services for its core functionalities. Voice input and transcription are handled by OpenAI's Whisper or Groq's Whisper models. Text-to-speech output utilizes OpenAI's TTS models. Image processing can be done via OpenAI's GPT-4 Vision or Fal.ai's Llava-Next. Function calling and dynamic UI rendering are managed by OpenAI's GPT-3.5-Turbo. Configuration is centralized in app/config.tsx, allowing users to select providers and models for each feature, and optionally enable features like rate limiting (Upstash) and Langchain tracing.

Quick Start & Requirements

  • Install dependencies: npm install or bun install
  • Start development server: npm run dev or bun dev
  • Requires API keys for Groq, OpenAI, and optionally Serper, Upstash, Spotify, and Fal.AI.
  • Access the application at http://localhost:3000.
  • Configuration details: app/config.tsx

Highlighted Details

  • Supports multiple LLM providers including Groq (Llama3) and OpenAI (GPT-4o, GPT-3.5-Turbo).
  • Integrates OpenAI's Whisper for transcription and TTS models for speech output.
  • Offers vision inference via OpenAI's GPT-4 Vision or Fal.ai's Llava-Next.
  • Includes optional rate limiting with Upstash and tracing with LangSmith.
  • Features customizable UI toggles for response times, TTS, internet results, and photo uploads.

Maintenance & Community

The project is maintained by the developer behind Developers Digest. Support options include Patreon and Buy Me A Coffee. Links to the developer's website, GitHub, and Twitter are provided for engagement and updates.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing terms before commercial use or integration into closed-source projects.

Limitations & Caveats

Text-to-speech and function calling currently only support OpenAI providers. The project is inspired by commercial AI devices but is a developer template, requiring significant setup and API key management.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.