FakeShield by zhipeixu

Explainable image forgery detection and localization using MLLMs

Created 1 year ago

428 stars

Top 69.2% on SourcePulse

Project Summary

FakeShield is a novel framework for explainable image forgery detection and localization (e-IFDL), targeting researchers and practitioners in digital forensics and AI security. It leverages multi-modal large language models (MLLMs) to not only identify manipulated regions but also provide human-understandable explanations for the detected forgeries, addressing the opacity of traditional methods.

How It Works

FakeShield integrates a Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and a Multimodal Forgery Localization Module (MFLM). The DTE-FDM analyzes pixel-level artifacts and semantic inconsistencies, guided by domain tags to recognize various manipulation techniques. The MFLM then localizes these manipulations and generates textual explanations, enhancing interpretability. This multi-modal approach aims for improved generalization and robustness across diverse forgery types.

Quick Start & Requirements

Installation: Pip installation requires Python 3.9, PyTorch 1.13.0, and CUDA 11.6. Docker installation is recommended for reproducing paper results, with pre-built images available for zhipeixu/dte-fdm and zhipeixu/mflm.
Dependencies: Requires MMCV v1.4.7 and Flash Attention.
Model Weights: Download from Hugging Face (zhipeixu/fakeshield-v1-22b) and SAM pre-trained weights.
Demo: A CLI demo script (scripts/cli_demo.sh) is provided.
Resources: Training involves substantial datasets (CASIAv2, FFHQ, FaceAPP, SD_inpaint, MMTD-Set).

Highlighted Details

Presents the first explainable image forgery detection and localization (e-IFDL) task.
Introduces the MMTD-Set dataset with multi-modal descriptions for enhanced learning.
Supports detection of various forgeries including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.
Achieved acceptance at ICLR 2025.

Maintenance & Community

The project is associated with Peking University. Links to arXiv, Hugging Face checkpoints and datasets, and project pages are provided. Related projects like AvatarShield and EditGuard are also highlighted.

Licensing & Compatibility

The project is licensed under Apache 2.0, permitting commercial use and closed-source linking.

Limitations & Caveats

The README emphasizes using Docker for environment setup to reproduce paper results, suggesting potential complexities with direct pip installation. Specific versions of PyTorch and CUDA are required.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

91 stars in the last 30 days