OpenAIglasses_for_Navigation  by AI-FanGe

AI navigation and assistance system for the visually impaired

Created 5 months ago
1,215 stars

Top 31.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

AI-FanGe/OpenAIglasses_for_Navigation is an open-source framework for AI-driven assistive navigation and interaction, designed for visually impaired users. It integrates computer vision and NLP to provide real-time guidance for navigation, object recognition, and environmental awareness, aiming to enhance independence and safety.

How It Works

The system uses a FastAPI backend to process real-time video/audio streams. It leverages deep learning models like YOLOv8 for segmentation (blind paths, lanes) and YOLO-E for open-vocabulary object search, alongside MediaPipe for hand tracking. Optical flow (Lucas-Kanade) stabilizes input, while Aliyun DashScope provides ASR and multimodal chat (Qwen-Omni-Turbo) for voice interaction. Feedback is multimodal: visual annotations, voice, and hand guidance.

Quick Start & Requirements

  • Installation: Clone repo, create Python 3.9-3.11 venv, pip install -r requirements.txt. GPU acceleration requires CUDA 11.8+.
  • Prerequisites:
    • Hardware: Dev/Server: Intel i5+ CPU, NVIDIA GPU (CUDA 11.8+), 8GB+ RAM. Optional client: ESP32-CAM, mic, speakers.
    • Software: Python 3.9-3.11, CUDA 11.8+, modern browser.
    • API Keys: Mandatory Aliyun DashScope API key.
  • Setup: Requires downloading models (some links missing), configuring API keys (.env), and running python app_main.py.
  • Resources: Quick-start guide in README.

Highlighted Details

  • Navigation Suite: Blind path detection/guidance, obstacle avoidance, turn alerts, crossing assistance with traffic light recognition.
  • Object Interaction: Voice-commanded item search, real-time tracking, hand guidance, grab confirmation.
  • Multimodal AI: Real-time ASR and advanced multimodal dialogue via Aliyun DashScope.
  • Web Interface: Live video stream with annotations, status panels, and IMU 3D pose visualization.

Maintenance & Community

The project is explicitly for "exchange and learning only" and not production-ready. The README provides no details on active maintenance, contributors, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

Released under the permissive MIT License, allowing broad usage, including commercial applications and integration into closed-source projects, with standard attribution requirements.

Limitations & Caveats

  • Not Production-Ready: Explicitly stated as for "exchange and learning only," not for direct use by visually impaired individuals.
  • Incomplete Setup: Download links for some critical model files are missing ("[待补充]").
  • API Dependency: Relies on Aliyun DashScope, potentially incurring costs.
  • Hardware: Optimal performance requires a capable NVIDIA GPU.
Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
49 stars in the last 30 days

Explore Similar Projects

Starred by Peer Richelsen Peer Richelsen(Cofounder of Cal.com) and Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX).

VisionClaw by Intent-Lab

2.9%
2k
Real-time AI assistant for smart glasses
Created 2 months ago
Updated 1 week ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.2%
10k
Framework for multimodal computer operation
Created 2 years ago
Updated 6 months ago
Feedback? Help us improve.