OpenAIglasses_for_Navigation  by AI-FanGe

AI navigation and assistance system for the visually impaired

Created 2 months ago
1,060 stars

Top 35.7% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

AI-FanGe/OpenAIglasses_for_Navigation is an open-source framework for AI-driven assistive navigation and interaction, designed for visually impaired users. It integrates computer vision and NLP to provide real-time guidance for navigation, object recognition, and environmental awareness, aiming to enhance independence and safety.

How It Works

The system uses a FastAPI backend to process real-time video/audio streams. It leverages deep learning models like YOLOv8 for segmentation (blind paths, lanes) and YOLO-E for open-vocabulary object search, alongside MediaPipe for hand tracking. Optical flow (Lucas-Kanade) stabilizes input, while Aliyun DashScope provides ASR and multimodal chat (Qwen-Omni-Turbo) for voice interaction. Feedback is multimodal: visual annotations, voice, and hand guidance.

Quick Start & Requirements

  • Installation: Clone repo, create Python 3.9-3.11 venv, pip install -r requirements.txt. GPU acceleration requires CUDA 11.8+.
  • Prerequisites:
    • Hardware: Dev/Server: Intel i5+ CPU, NVIDIA GPU (CUDA 11.8+), 8GB+ RAM. Optional client: ESP32-CAM, mic, speakers.
    • Software: Python 3.9-3.11, CUDA 11.8+, modern browser.
    • API Keys: Mandatory Aliyun DashScope API key.
  • Setup: Requires downloading models (some links missing), configuring API keys (.env), and running python app_main.py.
  • Resources: Quick-start guide in README.

Highlighted Details

  • Navigation Suite: Blind path detection/guidance, obstacle avoidance, turn alerts, crossing assistance with traffic light recognition.
  • Object Interaction: Voice-commanded item search, real-time tracking, hand guidance, grab confirmation.
  • Multimodal AI: Real-time ASR and advanced multimodal dialogue via Aliyun DashScope.
  • Web Interface: Live video stream with annotations, status panels, and IMU 3D pose visualization.

Maintenance & Community

The project is explicitly for "exchange and learning only" and not production-ready. The README provides no details on active maintenance, contributors, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

Released under the permissive MIT License, allowing broad usage, including commercial applications and integration into closed-source projects, with standard attribution requirements.

Limitations & Caveats

  • Not Production-Ready: Explicitly stated as for "exchange and learning only," not for direct use by visually impaired individuals.
  • Incomplete Setup: Download links for some critical model files are missing ("[待补充]").
  • API Dependency: Relies on Aliyun DashScope, potentially incurring costs.
  • Hardware: Optimal performance requires a capable NVIDIA GPU.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
76 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.3%
10k
Framework for multimodal computer operation
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.