clifs  by johanmodin

Video search via text queries using CLIP

created 4 years ago
474 stars

Top 65.3% on sourcepulse

GitHubView on GitHub
Project Summary

CLIFS (Contrastive Language-Image Forensic Search) is a proof-of-concept tool for performing free-text searches within video content. It leverages OpenAI's CLIP model to match textual queries with visual frames, enabling users to find specific scenes or objects in videos using natural language. This is particularly useful for forensic analysis or content discovery in large video datasets.

How It Works

CLIFS extracts features from video frames using CLIP's image encoder. Search queries are processed by CLIP's text encoder to generate corresponding features. Similarity matching between frame and query features identifies relevant video segments. Results exceeding a defined similarity threshold are returned. A Django web server provides a user interface for interacting with the search engine.

Quick Start & Requirements

  • Install and run via Docker Compose: sh ./setup.sh followed by sh docker-compose build && docker-compose up.
  • GPU support requires docker-compose -f docker-compose-gpu.yml up.
  • Place video files in the data/input directory.
  • Access the interface at 127.0.0.1:8000.
  • Requires Docker and NVIDIA GPU drivers (for GPU support).

Highlighted Details

  • Utilizes OpenAI's CLIP model for language-image matching.
  • Capable of Optical Character Recognition (OCR) within video frames.
  • Demonstrates searching for specific objects, text, and descriptions (e.g., "A truck with the text 'odwalla'", "A white BMW car").

Maintenance & Community

No specific information on contributors, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

This is described as a proof-of-concept, suggesting potential limitations in robustness and scalability. The README does not detail performance benchmarks or specific unsupported video formats.

Health Check
Last commit

3 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Chenlin Meng Chenlin Meng(Cofounder of Pika), and
4 more.

clip-retrieval by rom1504

0.3%
3k
CLIP retrieval system for semantic search
created 4 years ago
updated 1 year ago
Feedback? Help us improve.