ada_v2 by nazirlouis

Multimodal AI assistant for design and automation

Created 2 months ago

361 stars

Top 78.1% on SourcePulse

Project Summary

Advanced Design Assistant (ADA) V2 is a sophisticated AI assistant for multimodal interaction, targeting developers and power users. It integrates real-time voice control, parametric 3D CAD generation, gesture-based UI manipulation, and autonomous web browsing within a desktop application, streamlining complex design workflows.

How It Works

This Electron desktop app uses a Python (FastAPI) backend orchestrating Google's Gemini 2.5 Native Audio for voice, MediaPipe for hand gestures and face authentication, and build123d for parametric CAD. Its architecture facilitates seamless data flow between the React frontend and backend services, enabling autonomous web agents via Playwright and direct 3D printer integration. This approach leverages cutting-edge AI and computer vision libraries for a novel, interactive design environment.

Quick Start & Requirements

Installation involves cloning the repo, setting up a Python 3.11 Conda environment (conda create -n ada_v2 python=3.11), installing Python dependencies (pip install -r requirements.txt), installing Playwright browsers (playwright install chromium), and frontend dependencies (npm install). Launch via npm run dev. Prerequisites include Node.js 18+, a webcam, and a Google Gemini API key configured in .env. macOS users require portaudio (brew install portaudio).

Highlighted Details

Multimodal AI: Real-time voice, computer vision (hand tracking, face auth), and gesture control.
Parametric CAD Generation: Creates editable 3D models from voice prompts using build123d.
Gesture-Controlled UI: "Minority Report" style interface for window manipulation.
Autonomous Web Agent: Automates browser tasks via Playwright.
3D Printing Integration: Slices models (OrcaSlicer) and sends jobs to compatible printers.
Face Authentication: Secure, local biometric login.

Maintenance & Community

Contribution guidelines are provided. However, the README lacks details on specific maintainers, community channels (Discord/Slack), or sponsorships, offering limited insight into project governance beyond the contribution process.

Licensing & Compatibility

Licensed under the permissive MIT License, allowing broad use, modification, and distribution, including in commercial and closed-source applications, with minimal restrictions beyond attribution.

Limitations & Caveats

Untested on Linux; primarily validated on macOS 14+ and Windows 10/11. Requires internet for Gemini API access (no offline mode). Face authentication is single-user (reference.jpg). Heavy Gemini API use may hit free-tier rate limits. A functional webcam is mandatory.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

53 stars in the last 30 days