Video search via text queries using CLIP
Top 65.3% on sourcepulse
CLIFS (Contrastive Language-Image Forensic Search) is a proof-of-concept tool for performing free-text searches within video content. It leverages OpenAI's CLIP model to match textual queries with visual frames, enabling users to find specific scenes or objects in videos using natural language. This is particularly useful for forensic analysis or content discovery in large video datasets.
How It Works
CLIFS extracts features from video frames using CLIP's image encoder. Search queries are processed by CLIP's text encoder to generate corresponding features. Similarity matching between frame and query features identifies relevant video segments. Results exceeding a defined similarity threshold are returned. A Django web server provides a user interface for interacting with the search engine.
Quick Start & Requirements
sh ./setup.sh
followed by sh docker-compose build && docker-compose up
.docker-compose -f docker-compose-gpu.yml up
.data/input
directory.127.0.0.1:8000
.Highlighted Details
Maintenance & Community
No specific information on contributors, community channels, or roadmap is provided in the README.
Licensing & Compatibility
The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
This is described as a proof-of-concept, suggesting potential limitations in robustness and scalability. The README does not detail performance benchmarks or specific unsupported video formats.
3 years ago
1 day