PyTorch code for zero-shot industrial anomaly detection
Top 74.3% on sourcepulse
MuSc provides a PyTorch implementation for zero-shot industrial anomaly classification and segmentation. It targets researchers and engineers in industrial quality control, offering a prompt-free method that leverages unlabeled test images to identify anomalies. The core innovation is "Mutual Scoring," which exploits the observation that normal image patches have more similar counterparts in other unlabeled images than abnormal patches do.
How It Works
MuSc employs Local Neighborhood Aggregation with Multiple Degrees (LNAMD) to extract patch features robust to varying anomaly sizes. The Mutual Scoring Mechanism (MSM) then uses these features to score images against each other, identifying anomalies without explicit training. For classification, a Re-scoring with Constrained Image-level Neighborhood (RsCIN) module further refines scores by suppressing noise. This approach avoids the need for labeled anomaly data or manual prompts, making it highly adaptable.
Quick Start & Requirements
conda create --name musc python=3.8
), activate it (conda activate musc
), and install dependencies (pip install -r requirements.txt
). PyTorch 2.0.1 and CUDA 11.7 are required../data
directory. VisA requires preprocessing via python ./datasets/visa_preprocess.py
.python examples/musc_main.py
or use scripts/musc.sh
, configuring parameters like --device
, --data_path
, --dataset_name
, --class_name
, and --backbone_name
.Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates including comparisons with new SOTA methods and bug fixes. The authors are responsive to user questions. Links to Arxiv and OpenReview are provided.
Licensing & Compatibility
Released under the MIT License, allowing for academic research and free commercial usage. A commercial license can be obtained by contacting the authors.
Limitations & Caveats
The project lists several TODO items, including reducing inference time (currently up to 955.3ms per image with ViT-L-14-336 at 518 resolution) and improving compatibility with more datasets and backbones (e.g., Vision Mamba). Visualization normalization might highlight large areas on normal images if not configured as whole_norm
.
1 year ago
Inactive