VisioFirm by OschAI

AI-powered annotation tool for accelerating computer vision workflows

Created 4 months ago

411 stars

Top 71.1% on SourcePulse

Project Summary

VisioFirm is an open-source, AI-assisted annotation tool designed to accelerate computer vision dataset labeling. It targets researchers, data scientists, and ML engineers working with large image and video datasets, offering significant time savings (up to 80%) through semi-automated pre-annotation and label propagation. The tool streamlines workflows with an intuitive web interface, powerful backend, and support for various annotation types and popular model formats.

How It Works

VisioFirm leverages state-of-the-art AI models for pre-annotation, including OpenAI CLIP for classification, SAM2 for segmentation, YOLO (v5-v12) for detection, and Grounding DINO for zero-shot object grounding. For video annotation, it offers advanced label propagation via a SAM2-powered "SmartPropagator" or various OpenCV trackers, enabling frame-to-frame consistency. The system supports cross-domain annotation, allowing detection models to generate segmentation masks or vice-versa. Its backend is migrated to FastAPI for improved performance, and it includes a Python API for pipeline integration.

Quick Start & Requirements

Installation: pip install -U visiofirm
Development Install: Clone the repository (git clone https://github.com/OschAI/VisioFirm.git), navigate to the directory, and run pip install -e .
Launch: Execute visiofirm in the terminal.
Prerequisites: Python 3.10+. For v1, users must clear or rename existing cache directories (~/.cache/visiofirm_cache, ~/Library/Caches/visiofirm_cache, or %LOCALAPPDATA%\visiofirm_cache) before first run to avoid conflicts.
Links: GitHub Repository

Highlighted Details

AI-Driven Pre-Annotation: Utilizes YOLO, SAM2, Grounding DINO, and CLIP to automate object detection, segmentation, and classification, claiming up to 80% manual effort reduction.
Video Annotation & Label Propagation: Features SAM2-based SmartPropagator and multiple OpenCV trackers for efficient video labeling.
Broad Model Support: Integrates seamlessly with Ultralytics YOLO models (v5-v12) and YOLOv8-world for open-vocab pre-annotation.
Cross-Domain Annotation: Enables using detection models for segmentation pre-labeling and vice-versa.
WebGPU Annotation: Offers interactive, browser-based SAM2 segmentation via WebGPU for faster computing.

Maintenance & Community

The project is maintained by Safouane El Ghazouali. Bug reports and feature requests can be submitted via the GitHub Issues page. A Discord community and a documentation website are planned for the future.

Licensing & Compatibility

VisioFirm itself is licensed under the Apache 2.0 license. However, it integrates third-party models with different licenses: Ultralytics YOLO uses AGPL-3.0, while SAM2 and GroundingDINO use Apache 2.0 and BSD 3-Clause. The AGPL-3.0 license for Ultralytics YOLO may impose copyleft restrictions on derivative works or linked applications, potentially impacting closed-source commercial use.

Limitations & Caveats

Official documentation and community support (Discord) are listed as "SOON". The v1 release requires manual cache directory management to ensure proper initialization. The AGPL-3.0 license of a key dependency (Ultralytics YOLO) may present compatibility challenges for certain commercial or closed-source applications.

VisioFirm by OschAI

Explore Similar Projects

NExT-Chat by NExT-ChatV

PAM by Perceive-Anything

GPT4Scene-and-VLN-R1 by Qi-Zhangyang

PixelRefer by alibaba-damo-academy

segment-anything-with-clip by Curt-Park

segment-anything-webui by Kingfish404

UniVTG by showlab

ComfyUI-YoloWorld-EfficientSAM by ZHO-ZHO-ZHO

labelU by opendatalab

sd-webui-segment-anything by continue-revolution

FastSAM by CASIA-LMC-Lab

X-AnyLabeling by CVHub520