live-vlm-webui by NVIDIA-AI-IOT

Real-time VLM interaction and benchmarking

Created 8 months ago

388 stars

Top 73.4% on SourcePulse

Project Summary

A universal web interface for real-time Vision Language Model (VLM) interaction via webcam, this project enables live AI-powered analysis for benchmarking and exploration across diverse hardware. It targets engineers, researchers, and power users seeking a streamlined platform for testing VLMs, offering benefits like cross-platform compatibility and flexible VLM backend integration.

How It Works

The system utilizes WebRTC for efficient, low-latency webcam streaming directly to the browser. It integrates with various VLM backends, including Ollama, vLLM, NVIDIA NIM, and cloud APIs, allowing users to select and configure their preferred models. The architecture supports asynchronous processing, ensuring a responsive user interface while VLMs analyze video frames in the background.

Quick Start & Requirements

Installation: Primarily via pip install live-vlm-webui for PC, Mac, DGX, and Jetson. Docker is recommended for Jetson and production deployments.
Execution: Run live-vlm-webui after pip installation or use ./scripts/start_container.sh for Docker. Access the UI at https://localhost:8090.
Prerequisites: A VLM backend (Ollama, vLLM, cloud API) is required. Jetson platforms need specific JetPack versions (6.x or 7.0) for pip installs; Docker is simpler for Jetson. macOS requires pip install due to Docker limitations. Windows users should use WSL2.
Links: Demo Video: https://github.com/user-attachments/assets/47a920da-b943-4494-9b28-c4ea86e192e4

Highlighted Details

Supports multiple VLM backends: Ollama, vLLM, NVIDIA NIM, NVIDIA API Catalog, OpenAI API.
Extensive platform support including Linux PC (x86_64), DGX Spark (ARM64), macOS (Apple Silicon), Windows (WSL2), and NVIDIA Jetson (Orin, Thor).
Features a modern, NVIDIA-themed UI with light/dark themes, real-time GPU/VRAM/CPU/RAM monitoring, and inference latency metrics.
Offers interactive prompt editing with presets and custom options, alongside asynchronous frame processing for a smooth user experience.
Designed for diverse use cases such as security, robotics, industrial automation, healthcare, and education.

Maintenance & Community

The project originates from NVIDIA AI IoT. Specific details on community channels (e.g., Discord, Slack), active contributors, or sponsorships are not explicitly detailed in the provided README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Ollama version 0.12.10 is incompatible with Jetson Thor (JetPack 7.0); an earlier version is required. Native Windows installation requires additional setup for FFmpeg and build tools; WSL2 is recommended. Docker on macOS cannot access localhost, necessitating a pip installation. RTSP IP camera support is currently in Beta. Jetson Thor requires pipx and specific jetson-stats installation steps for full functionality.

live-vlm-webui by NVIDIA-AI-IOT

Explore Similar Projects

BarrageGPT by SwaggyMacro

webcamGPT by roboflow

ai-chat by pushpak1300

node-sdks by livekit

python-sdks by livekit

client-sdk-android by livekit

oryx by ossrs

openai-realtime-embedded by openai

vidgear by abhiTronix

meet by cloudflare

ruby-openai by alexrudall

vdo.ninja by steveseguin