augraphy  by sparkfish

Python library for document image augmentation

created 4 years ago
446 stars

Top 68.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Augraphy is a Python library designed to generate realistic synthetic document distortions simulating paper printing, faxing, scanning, and copying processes. It targets AI/ML researchers and engineers working on document analysis tasks like OCR, form recognition, and document restoration, enabling the creation of large, diverse training datasets from clean source documents.

How It Works

Augraphy employs a pipeline-based approach. It first extracts text and graphics ("ink") from a clean document, then applies a series of distortions to this ink layer. Simultaneously, a "paper factory" provides a base paper layer, which can also undergo distortions. The processed ink and paper layers are merged, and further augmentations like folds or physical deformations are applied. This layered, multi-stage process allows for the creation of highly varied and realistic degraded document images.

Quick Start & Requirements

  • Install via pip: pip install augraphy
  • Requires Python 3.7+ and OpenCV (opencv-python).
  • Example usage:
from augraphy import *
pipeline = default_augraphy_pipeline()
augmented = pipeline(image)
  • Full documentation is available in the doc directory.

Highlighted Details

  • Offers over 50 distinct augmentation techniques, categorized into pixel-level and spatial-level effects.
  • Spatial augmentations can affect image, alpha layer, masks, keypoints, and bounding boxes.
  • Benchmarks on the Tobacco3482 dataset show varying performance impacts across augmentations, with some operations like Geometric and SectionShift achieving high image throughput.

Maintenance & Community

  • The project is maintained by The Augraphy Project.
  • Contributions via pull requests are welcome; feature requests should be discussed via issues.
  • BibTeX citations are provided for research use.

Licensing & Compatibility

  • Distributed under the MIT license.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • While extensive, the library's focus is on paper-oriented distortions; camera-phone distortions are a future consideration.
  • Performance varies significantly between augmentation types, with some being computationally intensive.
Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.