Vision-Agents  by GetStream

Build real-time vision agents with any model or provider

Created 2 months ago
528 stars

Top 59.9% on SourcePulse

GitHubView on GitHub
Project Summary

GetStream/Vision-Agents offers a framework for rapidly developing real-time video AI applications. It enables developers to integrate diverse object detection models (e.g., YOLO) and LLMs (OpenAI, Gemini, Claude) with ultra-low latency, leveraging Stream's edge network. The project targets developers building sophisticated video analysis tools for applications like sports coaching, surveillance, and interactive gaming.

How It Works

The core Agent class orchestrates LLM interactions with specialized processors. These processors execute auxiliary AI models (like YOLO for pose estimation) and manage tasks such as API calls, audio/video manipulation, and state tracking. This modular design facilitates flexible integration of various AI capabilities, feeding real-time video and audio into LLMs for analysis, optimized for low latency via Stream's edge infrastructure.

Quick Start & Requirements

  • Installation: Primarily Python-based; requires specific model weights (e.g., yolo11n-pose.pt).
  • Prerequisites: Python environment, LLM API keys (OpenAI, Gemini), and potentially hardware for model execution. SDKs available for React, Android, iOS, Flutter, React Native, Unity.
  • Documentation: Guides and tutorials at VisionAgents.ai.
  • Resource Footprint: Not specified, but real-time video AI can be resource-intensive.

Highlighted Details

  • Real-time Performance: Engineered for low-latency video AI, claiming 500ms join times and 30ms A/V latency.
  • Model Agnosticism: Supports multiple LLMs (OpenAI, Gemini, Claude) and vision models (YOLO, Roboflow).
  • Extensive SDKs: Client libraries for web, mobile, and game development platforms.
  • Diverse Applications: Enables sports coaching, drone detection, physical therapy assistance, and invisible AI assistants.

Maintenance & Community

Developed by Stream, the project highlights key figures in AI research. A roadmap indicates ongoing development, with planned additions including broader model support (Roboflow, QWen3, Moondream vision) and enhanced WebRTC capabilities. Community interaction channels are not explicitly detailed.

Licensing & Compatibility

The specific open-source license is not stated in the provided README, a critical omission for adoption assessment. Compatibility is emphasized with Stream Chat and various LLM/video providers.

Limitations & Caveats

The project appears to be in active development, with several features listed as "Coming Soon" and acknowledged limitations in its underlying WebRTC library. Reliance on Stream's proprietary edge network for optimal performance may present vendor-specific integration challenges. The absence of explicit licensing information poses a significant adoption barrier.

Health Check
Last Commit

5 hours ago

Responsiveness

Inactive

Pull Requests (30d)
75
Issues (30d)
9
Star History
534 stars in the last 30 days

Explore Similar Projects

Starred by Guillermo Rauch Guillermo Rauch(Founder of Vercel), Jared Palmer Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), and
18 more.

ai by vercel

0.8%
19k
AI SDK for building AI-powered applications and agents
Created 2 years ago
Updated 3 hours ago
Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
15 more.

semantic-kernel by microsoft

0.2%
27k
SDK for building intelligent AI agents and multi-agent systems
Created 2 years ago
Updated 4 hours ago
Feedback? Help us improve.