DocRes by ZZZHANG-jx

Generalist model for document image restoration

Created 1 year ago

565 stars

Top 56.9% on SourcePulse

Project Summary

DocRes is a generalist model designed to unify various document image restoration tasks, including dewarping, deshadowing, appearance correction, deblurring, and binarization. It aims to provide a single, versatile solution for improving the quality of scanned or photographed documents, benefiting researchers and practitioners in document analysis and computer vision.

How It Works

DocRes employs a unified architecture that can handle multiple restoration tasks. The model leverages a combination of techniques, likely including diffusion models or similar generative approaches, to reconstruct degraded document images. This unified approach allows for efficient training and inference across different restoration needs, avoiding the need for task-specific models.

Quick Start & Requirements

Inference: Place model weights (mbd.pkl, docres.pkl) in ./data/MBD/checkpoint/ and ./checkpoints/ respectively. Run python inference.py --im_path <path_to_image> --task <dewarping|deshadowing|appearance|deblurring|binarization|end2end>.
Prerequisites: Python, PyTorch. Specific dataset preparation instructions are available in the repository.
Demo: An "Open in Spaces" link to Hugging Face Spaces is provided for interactive demonstration.

Highlighted Details

Official implementation of the CVPR 2024 paper "DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks".
Supports multiple restoration tasks: dewarping, deshadowing, appearance, deblurring, binarization, and end-to-end restoration.
Includes evaluation scripts for various datasets like realdae, tdd, dibco18, etc.
Training scripts are available for custom model training.

Maintenance & Community

The project is associated with authors from CVPR 2024 and includes recent updates evaluating generative models like GPT-4o for document processing tasks. Links to arXiv papers and IJCV 2025 work (LGGPT) are provided, indicating active research in related areas.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing terms for commercial use or integration into closed-source projects.

Limitations & Caveats

The README focuses on inference and evaluation setup, with detailed dataset preparation and training instructions requiring further exploration within the repository. Specific hardware requirements (e.g., GPU) are not explicitly mentioned but are implied for efficient operation.

DocRes by ZZZHANG-jx

Explore Similar Projects

METER by zdou0830

InstructCV by AlaaLab

X-VLM by zengyan-97

ddrm by bahjat-kawar

MST-plus-plus by caiyuanhao1998

Show-o by showlab

custom-diffusion by adobe-research

DeepSeek-OCR-2 by deepseek-ai

Palette-Image-to-Image-Diffusion-Models by Janspiry

lmms-eval by EvolvingLMMs-Lab

KAIR by cszn

pytorch-image-models by huggingface