MuSc by xrli-U

PyTorch code for zero-shot industrial anomaly detection

Created 1 year ago

424 stars

Top 69.5% on SourcePulse

Project Summary

MuSc provides a PyTorch implementation for zero-shot industrial anomaly classification and segmentation. It targets researchers and engineers in industrial quality control, offering a prompt-free method that leverages unlabeled test images to identify anomalies. The core innovation is "Mutual Scoring," which exploits the observation that normal image patches have more similar counterparts in other unlabeled images than abnormal patches do.

How It Works

MuSc employs Local Neighborhood Aggregation with Multiple Degrees (LNAMD) to extract patch features robust to varying anomaly sizes. The Mutual Scoring Mechanism (MSM) then uses these features to score images against each other, identifying anomalies without explicit training. For classification, a Re-scoring with Constrained Image-level Neighborhood (RsCIN) module further refines scores by suppressing noise. This approach avoids the need for labeled anomaly data or manual prompts, making it highly adaptable.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create --name musc python=3.8), activate it (conda activate musc), and install dependencies (pip install -r requirements.txt). PyTorch 2.0.1 and CUDA 11.7 are required.
Datasets: Download MVTec AD, VisA, and BTAD datasets and place them in the ./data directory. VisA requires preprocessing via python ./datasets/visa_preprocess.py.
Run: Execute python examples/musc_main.py or use scripts/musc.sh, configuring parameters like --device, --data_path, --dataset_name, --class_name, and --backbone_name.

Highlighted Details

Achieves state-of-the-art zero-shot performance, outperforming most few-shot methods on MVTec AD and VisA datasets.
Demonstrates significant gains over prior zero-shot methods, e.g., +21.1% PRO on MVTec AD.
Supports various backbones including CLIP (ViT-B/L variants), DINO, and DINO_v2.
Provides detailed results and inference times for different backbones and image sizes.

Maintenance & Community

The project is actively maintained, with recent updates including comparisons with new SOTA methods and bug fixes. The authors are responsive to user questions. Links to Arxiv and OpenReview are provided.

Licensing & Compatibility

Released under the MIT License, allowing for academic research and free commercial usage. A commercial license can be obtained by contacting the authors.

Limitations & Caveats

The project lists several TODO items, including reducing inference time (currently up to 955.3ms per image with ViT-L-14-336 at 518 resolution) and improving compatibility with more datasets and backbones (e.g., Vision Mamba). Visualization normalization might highlight large areas on normal images if not configured as whole_norm.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days