ha-llmvision by valentinfrlch

Home Assistant integration for multimodal LLM vision

Created 1 year ago

1,144 stars

Top 33.6% on SourcePulse

1 Expert Loves This Project

mudler

Ettore Di Giacinto

Author of LocalAI

Project Summary

This project provides a Home Assistant integration for analyzing images and video streams using multimodal Large Language Models (LLMs). It targets Home Assistant users who want to leverage AI for intelligent analysis of camera feeds, video files, and events, enabling features like object recognition, event summarization, and timeline tracking.

How It Works

LLM Vision integrates with various LLM providers, including OpenAI, Anthropic Claude, Google Gemini, and local solutions like Ollama and LocalAI. It processes visual data (images, video, live feeds) and uses LLMs to extract information, identify objects, people, or pets, and maintain a chronological timeline of events. This allows for intelligent sensor updates and natural language querying of past events.

Quick Start & Requirements

Install via Home Assistant Community Store (HACS).
Requires Home Assistant setup.
Provider API keys or local LLM setup (e.g., Ollama) are necessary.
Detailed setup instructions for providers are available at: https://llm-vision.gitbook.io/getting-started/setup/providers

Highlighted Details

Supports a wide range of LLM providers, including OpenAI-compatible endpoints.
Analyzes images, video files, live camera feeds, and Frigate events.
Maintains a timeline of analyzed events, viewable on a dashboard.
Can intelligently summarize camera event notifications via a blueprint.

Maintenance & Community

Project is actively maintained.
Community discussions are available via the Home Assistant Community forum.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

The specific license is not mentioned, which may impact commercial use or integration into closed-source projects.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

3

Issues (30d)

5

Star History

36 stars in the last 30 days

Explore Similar Projects

PAM by Perceive-Anything

Comprehensive region-level visual understanding for images and videos

Created 6 months ago

Updated 2 months ago

Video-LLaVA by mbzuai-oryx

Video-language model with pixel-level grounding

Created 2 years ago

Updated 3 months ago

ReMind by DonTizi

AI memory agent for local digital activity capture

Created 1 year ago

Updated 11 months ago

webcamGPT by roboflow

CLI tool for chatting with a webcam video stream

Created 2 years ago

Updated 1 year ago

scan-for-webcams by JettChenT

CLI tool for scanning webcams on the internet

Created 5 years ago

Updated 2 years ago

Starred by

Haotian Liu

Haotian Liu(Author of LLaVA; Research Scientist at xAI) and

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI).

LLaVA-Plus-Codebase by LLaVA-VL

Multimodal agent for vision tasks using external tools

Created 2 years ago

Updated 1 year ago

Owl by OwlAIProject

Wearable AI captures life experiences, running locally

Created 1 year ago

Updated 1 year ago

Memento by apirrone

Python app for recording and LLM-based recall of computer activity

Created 2 years ago

Updated 1 year ago

MM-REACT by microsoft

MM-REACT is a system for multimodal reasoning and action

Created 2 years ago

Updated 1 year ago

raycast-g4f by XInTheDark

Raycast extension for free access to multiple AI models

Created 1 year ago

Updated 1 month ago

xiaomi-miloco by XiaoMi

Smart home copilot using LLMs and visual data

Created 3 weeks ago

Updated 2 days ago

VideoPipe by sherlockchou86

Cross-platform C++ framework for video analysis and structuring

Created 3 years ago

Updated 3 weeks ago

Feedback? Help us improve.