machina  by PsyChip

CCTV viewer for realtime object tagging

created 9 months ago
764 stars

Top 46.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MACHINA is a video surveillance system that leverages OpenCV, YOLO, and LLAVA for real-time object tagging and scene captioning. It is designed for users who need to monitor video streams and gain insights into detected objects and overall scene context. The system aims to provide a headless security solution.

How It Works

MACHINA connects to RTSP streams, processing frames in a separate thread. YOLO detects objects, assigning unique IDs based on position and time. A background thread uses LLM requests (Ollama server with LLAVA) for object tagging. For scene captioning, BLIP generates captions every 30 frames, and CLIP matches these captions against pre-generated text every 10 frames, enabling real-time scene descriptions.

Quick Start & Requirements

  • Install: Clone the repository, install dependencies (pip install -r requirements.txt), uninstall CPU PyTorch and install CUDA version (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118), and run (py app.py).
  • Prerequisites: Python 3.12.x, Ollama server with LLAVA model, CUDA-enabled PyTorch, Visual C++ redistributables (Windows).
  • Notes: CUDA-enabled PyTorch is essential for real-time performance. Adjust vsize based on the YOLO model used.
  • Links: Microsoft VC++ Redistributables

Highlighted Details

  • Achieves 600ms average for captioning and 47ms for caption matching on an RTX 3060.
  • Processes frames at 640x480, achieving 20ms interference time with YOLOv11-small on a GTX 1060.
  • Implements object matching based on detection box centers with a 16px tolerance.
  • Supports snapshotting (S), scene captioning (C), and recording (R).

Maintenance & Community

  • This is a personal project developed in spare time.
  • Feature requests can be prioritized with donations via Ko-fi or Bitcoin.
  • Contact: root@psychip.net

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • The project is marked as "[WIP]" (Work In Progress).
  • Stream delays can occur due to network conditions, with a frame skip mechanism implemented.
  • Pre-trained YOLO models may lack accuracy on low-resolution streams; custom training is recommended.
  • No explicit mention of supported operating systems beyond Windows prerequisites.
Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.