DocRes  by ZZZHANG-jx

Generalist model for document image restoration

Created 1 year ago
503 stars

Top 61.9% on SourcePulse

GitHubView on GitHub
Project Summary

DocRes is a generalist model designed to unify various document image restoration tasks, including dewarping, deshadowing, appearance correction, deblurring, and binarization. It aims to provide a single, versatile solution for improving the quality of scanned or photographed documents, benefiting researchers and practitioners in document analysis and computer vision.

How It Works

DocRes employs a unified architecture that can handle multiple restoration tasks. The model leverages a combination of techniques, likely including diffusion models or similar generative approaches, to reconstruct degraded document images. This unified approach allows for efficient training and inference across different restoration needs, avoiding the need for task-specific models.

Quick Start & Requirements

  • Inference: Place model weights (mbd.pkl, docres.pkl) in ./data/MBD/checkpoint/ and ./checkpoints/ respectively. Run python inference.py --im_path <path_to_image> --task <dewarping|deshadowing|appearance|deblurring|binarization|end2end>.
  • Prerequisites: Python, PyTorch. Specific dataset preparation instructions are available in the repository.
  • Demo: An "Open in Spaces" link to Hugging Face Spaces is provided for interactive demonstration.

Highlighted Details

  • Official implementation of the CVPR 2024 paper "DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks".
  • Supports multiple restoration tasks: dewarping, deshadowing, appearance, deblurring, binarization, and end-to-end restoration.
  • Includes evaluation scripts for various datasets like realdae, tdd, dibco18, etc.
  • Training scripts are available for custom model training.

Maintenance & Community

The project is associated with authors from CVPR 2024 and includes recent updates evaluating generative models like GPT-4o for document processing tasks. Links to arXiv papers and IJCV 2025 work (LGGPT) are provided, indicating active research in related areas.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing terms for commercial use or integration into closed-source projects.

Limitations & Caveats

The README focuses on inference and evaluation setup, with detailed dataset preparation and training instructions requiring further exploration within the repository. Specific hardware requirements (e.g., GPU) are not explicitly mentioned but are implied for efficient operation.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.