VisionClaw  by sseanliu

Real-time AI assistant for smart glasses

Created 2 weeks ago

New!

1,310 stars

Top 30.1% on SourcePulse

GitHubView on GitHub
Project Summary

VisionClaw offers a real-time AI assistant for Meta Ray-Ban smart glasses, integrating voice, vision, and agentic actions via Gemini Live and optional OpenClaw. It targets users seeking hands-free, context-aware assistance, enabling actions through connected apps.

How It Works

An iOS app bridges Meta glasses (or iPhone camera) with the Gemini Live API. Video (~1fps JPEG) and audio (16kHz PCM) stream to Gemini, which processes input for real-time visual description and voice understanding. Gemini responds with audio or tool calls. The optional OpenClaw gateway translates these calls into actions across 56+ skills (messaging, web search, smart home), enabling agentic capabilities. This architecture prioritizes native audio handling and direct tool execution.

Quick Start & Requirements

Clone the repo and open CameraAccess.xcodeproj in Xcode. Configure your Gemini API key (free from Google AI Studio) in GeminiConfig.swift. Requires iOS 17.0+ and Xcode 15.0+. Test without glasses using iPhone camera mode. For agentic actions, set up the OpenClaw gateway on a local machine, ensuring network access and enabling chatCompletions.

Highlighted Details

  • Real-time streaming pipeline: ~1fps video, bidirectional audio between glasses/iPhone, app, and Gemini Live.
  • Gemini Live API: Native audio handling, bypassing separate STT/TTS.
  • OpenClaw extensibility: Access to 56+ skills for diverse task execution.
  • Flexible testing: Supports Meta Ray-Ban glasses and iPhone camera.

Maintenance & Community

The README provides no specific details on maintenance contributors, community channels, or roadmap information.

Licensing & Compatibility

Licensed under terms in the root LICENSE file. Specifics on commercial use or closed-source compatibility are not detailed in the README.

Limitations & Caveats

Agentic actions require optional OpenClaw setup. Primary use case needs Meta Ray-Ban glasses; iPhone mode serves as a testing alternative. The project is iOS-specific and depends on external APIs (Gemini Live).

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
9
Star History
1,319 stars in the last 19 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 2 years ago
Updated 5 months ago
Feedback? Help us improve.