FakeShield  by zhipeixu

Explainable image forgery detection and localization using MLLMs

Created 1 year ago
334 stars

Top 82.2% on SourcePulse

GitHubView on GitHub
Project Summary

FakeShield is a novel framework for explainable image forgery detection and localization (e-IFDL), targeting researchers and practitioners in digital forensics and AI security. It leverages multi-modal large language models (MLLMs) to not only identify manipulated regions but also provide human-understandable explanations for the detected forgeries, addressing the opacity of traditional methods.

How It Works

FakeShield integrates a Domain Tag-guided Explainable Forgery Detection Module (DTE-FDM) and a Multimodal Forgery Localization Module (MFLM). The DTE-FDM analyzes pixel-level artifacts and semantic inconsistencies, guided by domain tags to recognize various manipulation techniques. The MFLM then localizes these manipulations and generates textual explanations, enhancing interpretability. This multi-modal approach aims for improved generalization and robustness across diverse forgery types.

Quick Start & Requirements

  • Installation: Pip installation requires Python 3.9, PyTorch 1.13.0, and CUDA 11.6. Docker installation is recommended for reproducing paper results, with pre-built images available for zhipeixu/dte-fdm and zhipeixu/mflm.
  • Dependencies: Requires MMCV v1.4.7 and Flash Attention.
  • Model Weights: Download from Hugging Face (zhipeixu/fakeshield-v1-22b) and SAM pre-trained weights.
  • Demo: A CLI demo script (scripts/cli_demo.sh) is provided.
  • Resources: Training involves substantial datasets (CASIAv2, FFHQ, FaceAPP, SD_inpaint, MMTD-Set).

Highlighted Details

  • Presents the first explainable image forgery detection and localization (e-IFDL) task.
  • Introduces the MMTD-Set dataset with multi-modal descriptions for enhanced learning.
  • Supports detection of various forgeries including copy-move, splicing, removal, DeepFake, and AI-generated manipulations.
  • Achieved acceptance at ICLR 2025.

Maintenance & Community

The project is associated with Peking University. Links to arXiv, Hugging Face checkpoints and datasets, and project pages are provided. Related projects like AvatarShield and EditGuard are also highlighted.

Licensing & Compatibility

The project is licensed under Apache 2.0, permitting commercial use and closed-source linking.

Limitations & Caveats

The README emphasizes using Docker for environment setup to reproduce paper results, suggesting potential complexities with direct pip installation. Specific versions of PyTorch and CUDA are required.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.