Home Assistant integration for multimodal LLM vision
Top 37.5% on SourcePulse
This project provides a Home Assistant integration for analyzing images and video streams using multimodal Large Language Models (LLMs). It targets Home Assistant users who want to leverage AI for intelligent analysis of camera feeds, video files, and events, enabling features like object recognition, event summarization, and timeline tracking.
How It Works
LLM Vision integrates with various LLM providers, including OpenAI, Anthropic Claude, Google Gemini, and local solutions like Ollama and LocalAI. It processes visual data (images, video, live feeds) and uses LLMs to extract information, identify objects, people, or pets, and maintain a chronological timeline of events. This allows for intelligent sensor updates and natural language querying of past events.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 day ago
1 day