ada by Nlouis38

AI assistant for STEM with voice/text interaction

Created 9 months ago

272 stars

Top 94.9% on SourcePulse

Project Summary

ADA (Advanced Design Assistant) is an AI assistant specializing in STEM fields, offering voice and text interaction for concise, accurate information and task assistance. It provides both a cloud-dependent online version leveraging Google Gemini and ElevenLabs for enhanced performance, and a local version reliant on user hardware and Ollama. The project is ideal for STEM professionals and researchers seeking an interactive AI assistant.

How It Works

ADA utilizes a modular architecture with distinct local and online components. The online version integrates Google Gemini for advanced natural language understanding and response generation, coupled with ElevenLabs for high-quality, low-latency Text-to-Speech (TTS). The local version relies on Ollama to serve models like Gemma, with performance directly tied to the user's hardware. Both versions employ RealtimeSTT for speech-to-text transcription and support function calling for task execution, such as accessing system information, managing timers, or performing web searches.

Quick Start & Requirements

Installation: Clone the repository, create a virtual environment, and install dependencies via pip install -r requirements.txt (assuming a requirements.txt file exists, otherwise list key packages like ollama, websockets, pyaudio, RealtimeSTT, RealtimeTTS, torch, google-generativeai, opencv-python, pillow, mss, psutil, GPUtil, elevenlabs, python-dotenv, python-weather, googlemaps).
Prerequisites: Python 3.11+, Ollama (for local version), CUDA-compatible GPU (optional, recommended for local), Microphone, Speakers, Headphones (recommended), API Keys (Google Gemini, ElevenLabs, Google Maps) stored in a .env file. FFmpeg is recommended for audio processing.
Setup: Estimated setup time involves cloning, environment setup, dependency installation, and API key configuration.
Links: GitHub Repository

Highlighted Details

Dual operational modes: ada_local and ada_online.
Real-time voice interaction via RealtimeSTT and RealtimeTTS (ElevenLabs or SystemEngine).
Function calling capabilities for task automation (e.g., system info, timers, project folders, weather, travel duration).
Multimodal demo (multimodal_live_api.py) supporting camera or screen sharing with audio.
STEM-focused knowledge base.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Users should verify licensing for all dependencies, especially for commercial or closed-source use.

Limitations & Caveats

The local version's performance is heavily dependent on user hardware. The README strongly recommends the online version for better quality and speed. Some tools like to_do_list.py are noted as not currently integrated as callable tools. The camera.py implementation is described as returning a string rather than maintaining an open feed.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days