MMAD by jam-cc

Benchmark for multimodal LLMs in industrial anomaly detection

Created 1 year ago

264 stars

Top 96.5% on SourcePulse

Project Summary

Summary

MMAD is the first comprehensive benchmark evaluating Multimodal Large Language Models (MLLMs) for industrial anomaly detection. It addresses the lack of systematic study on MLLMs' industrial quality inspection capabilities, offering a quantitative assessment to identify top models and challenges. This benchmark is vital for researchers and practitioners integrating MLLMs into industrial settings.

How It Works

The project defines seven key subtasks for MLLMs in industrial inspection, generating a dataset of 39,672 questions from 8,366 industrial images across 38 classes. This facilitates a full-spectrum evaluation pipeline involving data preparation, MLLM-specific configuration, and script execution, augmented by released textual domain knowledge.

Quick Start & Requirements

Clone the dataset from Hugging Face (git clone https://huggingface.co/datasets/jiang-cc/MMAD) after installing git-lfs. Model evaluation requires API keys for Gemini/GPT4, external environment setup for Cambrain/LLaVA/SPHINX, or pip install transformers for others. Evaluation scripts are in ./evaluation/examples. Links to external model repos are provided.

Highlighted Details

First full-spectrum benchmark for MLLMs in industrial anomaly detection.
Dataset: 39,672 questions, 8,366 images, 7 subtasks.
Comprehensive, quantitative MLLM evaluations.
Released textual domain knowledge for anomaly detection categories.
Added support for Qwen2.5-VL series (as of 2025-07-08).

Maintenance & Community

Recent updates include dataset streamlining plans (2026-01-14), Qwen2.5-VL support (2025-07-08), and experiment results release (2025-05-26). The project has published its ICLR'25 paper, human baseline, domain knowledge, and dataset on Hugging Face. No direct community links are provided.

Licensing & Compatibility

The repository's README does not specify a software license, necessitating further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

Ongoing efforts aim to streamline the dataset and reduce errors, suggesting potential data quality issues. Setting up specific MLLMs (Cambrain, LLaVA, SPHINX) requires following their original repository instructions, increasing setup complexity.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days