Vision-Agents by GetStream

Build real-time vision agents with any model or provider

Created 11 months ago

7,970 stars

Top 6.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

GetStream/Vision-Agents offers a framework for rapidly developing real-time video AI applications. It enables developers to integrate diverse object detection models (e.g., YOLO) and LLMs (OpenAI, Gemini, Claude) with ultra-low latency, leveraging Stream's edge network. The project targets developers building sophisticated video analysis tools for applications like sports coaching, surveillance, and interactive gaming.

How It Works

The core Agent class orchestrates LLM interactions with specialized processors. These processors execute auxiliary AI models (like YOLO for pose estimation) and manage tasks such as API calls, audio/video manipulation, and state tracking. This modular design facilitates flexible integration of various AI capabilities, feeding real-time video and audio into LLMs for analysis, optimized for low latency via Stream's edge infrastructure.

Quick Start & Requirements

Installation: Primarily Python-based; requires specific model weights (e.g., yolo11n-pose.pt).
Prerequisites: Python environment, LLM API keys (OpenAI, Gemini), and potentially hardware for model execution. SDKs available for React, Android, iOS, Flutter, React Native, Unity.
Documentation: Guides and tutorials at VisionAgents.ai.
Resource Footprint: Not specified, but real-time video AI can be resource-intensive.

Highlighted Details

Real-time Performance: Engineered for low-latency video AI, claiming 500ms join times and 30ms A/V latency.
Model Agnosticism: Supports multiple LLMs (OpenAI, Gemini, Claude) and vision models (YOLO, Roboflow).
Extensive SDKs: Client libraries for web, mobile, and game development platforms.
Diverse Applications: Enables sports coaching, drone detection, physical therapy assistance, and invisible AI assistants.

Maintenance & Community

Developed by Stream, the project highlights key figures in AI research. A roadmap indicates ongoing development, with planned additions including broader model support (Roboflow, QWen3, Moondream vision) and enhanced WebRTC capabilities. Community interaction channels are not explicitly detailed.

Licensing & Compatibility

The specific open-source license is not stated in the provided README, a critical omission for adoption assessment. Compatibility is emphasized with Stream Chat and various LLM/video providers.

Limitations & Caveats

The project appears to be in active development, with several features listed as "Coming Soon" and acknowledged limitations in its underlying WebRTC library. Reliance on Stream's proprietary edge network for optimal performance may present vendor-specific integration challenges. The absence of explicit licensing information poses a significant adoption barrier.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

100 stars in the last 30 days