live-vlm-webui  by NVIDIA-AI-IOT

Real-time VLM interaction and benchmarking

Created 5 months ago
306 stars

Top 87.6% on SourcePulse

GitHubView on GitHub
Project Summary

A universal web interface for real-time Vision Language Model (VLM) interaction via webcam, this project enables live AI-powered analysis for benchmarking and exploration across diverse hardware. It targets engineers, researchers, and power users seeking a streamlined platform for testing VLMs, offering benefits like cross-platform compatibility and flexible VLM backend integration.

How It Works

The system utilizes WebRTC for efficient, low-latency webcam streaming directly to the browser. It integrates with various VLM backends, including Ollama, vLLM, NVIDIA NIM, and cloud APIs, allowing users to select and configure their preferred models. The architecture supports asynchronous processing, ensuring a responsive user interface while VLMs analyze video frames in the background.

Quick Start & Requirements

  • Installation: Primarily via pip install live-vlm-webui for PC, Mac, DGX, and Jetson. Docker is recommended for Jetson and production deployments.
  • Execution: Run live-vlm-webui after pip installation or use ./scripts/start_container.sh for Docker. Access the UI at https://localhost:8090.
  • Prerequisites: A VLM backend (Ollama, vLLM, cloud API) is required. Jetson platforms need specific JetPack versions (6.x or 7.0) for pip installs; Docker is simpler for Jetson. macOS requires pip install due to Docker limitations. Windows users should use WSL2.
  • Links: Demo Video: https://github.com/user-attachments/assets/47a920da-b943-4494-9b28-c4ea86e192e4

Highlighted Details

  • Supports multiple VLM backends: Ollama, vLLM, NVIDIA NIM, NVIDIA API Catalog, OpenAI API.
  • Extensive platform support including Linux PC (x86_64), DGX Spark (ARM64), macOS (Apple Silicon), Windows (WSL2), and NVIDIA Jetson (Orin, Thor).
  • Features a modern, NVIDIA-themed UI with light/dark themes, real-time GPU/VRAM/CPU/RAM monitoring, and inference latency metrics.
  • Offers interactive prompt editing with presets and custom options, alongside asynchronous frame processing for a smooth user experience.
  • Designed for diverse use cases such as security, robotics, industrial automation, healthcare, and education.

Maintenance & Community

The project originates from NVIDIA AI IoT. Specific details on community channels (e.g., Discord, Slack), active contributors, or sponsorships are not explicitly detailed in the provided README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Ollama version 0.12.10 is incompatible with Jetson Thor (JetPack 7.0); an earlier version is required. Native Windows installation requires additional setup for FFmpeg and build tools; WSL2 is recommended. Docker on macOS cannot access localhost, necessitating a pip installation. RTSP IP camera support is currently in Beta. Jetson Thor requires pipx and specific jetson-stats installation steps for full functionality.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
40 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.