ada  by Nlouis38

AI assistant for STEM with voice/text interaction

Created 5 months ago
269 stars

Top 95.5% on SourcePulse

GitHubView on GitHub
Project Summary

ADA (Advanced Design Assistant) is an AI assistant specializing in STEM fields, offering voice and text interaction for concise, accurate information and task assistance. It provides both a cloud-dependent online version leveraging Google Gemini and ElevenLabs for enhanced performance, and a local version reliant on user hardware and Ollama. The project is ideal for STEM professionals and researchers seeking an interactive AI assistant.

How It Works

ADA utilizes a modular architecture with distinct local and online components. The online version integrates Google Gemini for advanced natural language understanding and response generation, coupled with ElevenLabs for high-quality, low-latency Text-to-Speech (TTS). The local version relies on Ollama to serve models like Gemma, with performance directly tied to the user's hardware. Both versions employ RealtimeSTT for speech-to-text transcription and support function calling for task execution, such as accessing system information, managing timers, or performing web searches.

Quick Start & Requirements

  • Installation: Clone the repository, create a virtual environment, and install dependencies via pip install -r requirements.txt (assuming a requirements.txt file exists, otherwise list key packages like ollama, websockets, pyaudio, RealtimeSTT, RealtimeTTS, torch, google-generativeai, opencv-python, pillow, mss, psutil, GPUtil, elevenlabs, python-dotenv, python-weather, googlemaps).
  • Prerequisites: Python 3.11+, Ollama (for local version), CUDA-compatible GPU (optional, recommended for local), Microphone, Speakers, Headphones (recommended), API Keys (Google Gemini, ElevenLabs, Google Maps) stored in a .env file. FFmpeg is recommended for audio processing.
  • Setup: Estimated setup time involves cloning, environment setup, dependency installation, and API key configuration.
  • Links: GitHub Repository

Highlighted Details

  • Dual operational modes: ada_local and ada_online.
  • Real-time voice interaction via RealtimeSTT and RealtimeTTS (ElevenLabs or SystemEngine).
  • Function calling capabilities for task automation (e.g., system info, timers, project folders, weather, travel duration).
  • Multimodal demo (multimodal_live_api.py) supporting camera or screen sharing with audio.
  • STEM-focused knowledge base.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Users should verify licensing for all dependencies, especially for commercial or closed-source use.

Limitations & Caveats

The local version's performance is heavily dependent on user hardware. The README strongly recommends the online version for better quality and speed. Some tools like to_do_list.py are noted as not currently integrated as callable tools. The camera.py implementation is described as returning a string rather than maintaining an open feed.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.