ada_v2  by nazirlouis

Multimodal AI assistant for design and automation

Created 1 month ago
258 stars

Top 98.1% on SourcePulse

GitHubView on GitHub
Project Summary

Advanced Design Assistant (ADA) V2 is a sophisticated AI assistant for multimodal interaction, targeting developers and power users. It integrates real-time voice control, parametric 3D CAD generation, gesture-based UI manipulation, and autonomous web browsing within a desktop application, streamlining complex design workflows.

How It Works

This Electron desktop app uses a Python (FastAPI) backend orchestrating Google's Gemini 2.5 Native Audio for voice, MediaPipe for hand gestures and face authentication, and build123d for parametric CAD. Its architecture facilitates seamless data flow between the React frontend and backend services, enabling autonomous web agents via Playwright and direct 3D printer integration. This approach leverages cutting-edge AI and computer vision libraries for a novel, interactive design environment.

Quick Start & Requirements

Installation involves cloning the repo, setting up a Python 3.11 Conda environment (conda create -n ada_v2 python=3.11), installing Python dependencies (pip install -r requirements.txt), installing Playwright browsers (playwright install chromium), and frontend dependencies (npm install). Launch via npm run dev. Prerequisites include Node.js 18+, a webcam, and a Google Gemini API key configured in .env. macOS users require portaudio (brew install portaudio).

Highlighted Details

  • Multimodal AI: Real-time voice, computer vision (hand tracking, face auth), and gesture control.
  • Parametric CAD Generation: Creates editable 3D models from voice prompts using build123d.
  • Gesture-Controlled UI: "Minority Report" style interface for window manipulation.
  • Autonomous Web Agent: Automates browser tasks via Playwright.
  • 3D Printing Integration: Slices models (OrcaSlicer) and sends jobs to compatible printers.
  • Face Authentication: Secure, local biometric login.

Maintenance & Community

Contribution guidelines are provided. However, the README lacks details on specific maintainers, community channels (Discord/Slack), or sponsorships, offering limited insight into project governance beyond the contribution process.

Licensing & Compatibility

Licensed under the permissive MIT License, allowing broad use, modification, and distribution, including in commercial and closed-source applications, with minimal restrictions beyond attribution.

Limitations & Caveats

Untested on Linux; primarily validated on macOS 14+ and Windows 10/11. Requires internet for Gemini API access (no offline mode). Face authentication is single-user (reference.jpg). Heavy Gemini API use may hit free-tier rate limits. A functional webcam is mandatory.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
3
Star History
259 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.