machina  by PsyChip

CCTV viewer for realtime object tagging

Created 11 months ago
776 stars

Top 45.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MACHINA is a video surveillance system that leverages OpenCV, YOLO, and LLAVA for real-time object tagging and scene captioning. It is designed for users who need to monitor video streams and gain insights into detected objects and overall scene context. The system aims to provide a headless security solution.

How It Works

MACHINA connects to RTSP streams, processing frames in a separate thread. YOLO detects objects, assigning unique IDs based on position and time. A background thread uses LLM requests (Ollama server with LLAVA) for object tagging. For scene captioning, BLIP generates captions every 30 frames, and CLIP matches these captions against pre-generated text every 10 frames, enabling real-time scene descriptions.

Quick Start & Requirements

  • Install: Clone the repository, install dependencies (pip install -r requirements.txt), uninstall CPU PyTorch and install CUDA version (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118), and run (py app.py).
  • Prerequisites: Python 3.12.x, Ollama server with LLAVA model, CUDA-enabled PyTorch, Visual C++ redistributables (Windows).
  • Notes: CUDA-enabled PyTorch is essential for real-time performance. Adjust vsize based on the YOLO model used.
  • Links: Microsoft VC++ Redistributables

Highlighted Details

  • Achieves 600ms average for captioning and 47ms for caption matching on an RTX 3060.
  • Processes frames at 640x480, achieving 20ms interference time with YOLOv11-small on a GTX 1060.
  • Implements object matching based on detection box centers with a 16px tolerance.
  • Supports snapshotting (S), scene captioning (C), and recording (R).

Maintenance & Community

  • This is a personal project developed in spare time.
  • Feature requests can be prioritized with donations via Ko-fi or Bitcoin.
  • Contact: root@psychip.net

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • The project is marked as "[WIP]" (Work In Progress).
  • Stream delays can occur due to network conditions, with a frame skip mechanism implemented.
  • Pre-trained YOLO models may lack accuracy on low-resolution streams; custom training is recommended.
  • No explicit mention of supported operating systems beyond Windows prerequisites.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Deshraj Yadav Deshraj Yadav(Cofounder of Mem0), and
7 more.

rcnn by rbgirshick

0.2%
2k
Object detection system using CNNs and region proposals
Created 11 years ago
Updated 8 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.2%
11k
Library for language-vision AI research
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.