augraphy by sparkfish

Python library for document image augmentation

Created 4 years ago

493 stars

Top 62.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Guillaume Lample

Cofounder of Mistral

Project Summary

Augraphy is a Python library designed to generate realistic synthetic document distortions simulating paper printing, faxing, scanning, and copying processes. It targets AI/ML researchers and engineers working on document analysis tasks like OCR, form recognition, and document restoration, enabling the creation of large, diverse training datasets from clean source documents.

How It Works

Augraphy employs a pipeline-based approach. It first extracts text and graphics ("ink") from a clean document, then applies a series of distortions to this ink layer. Simultaneously, a "paper factory" provides a base paper layer, which can also undergo distortions. The processed ink and paper layers are merged, and further augmentations like folds or physical deformations are applied. This layered, multi-stage process allows for the creation of highly varied and realistic degraded document images.

Quick Start & Requirements

Install via pip: pip install augraphy
Requires Python 3.7+ and OpenCV (opencv-python).
Example usage:

from augraphy import *
pipeline = default_augraphy_pipeline()
augmented = pipeline(image)

Full documentation is available in the doc directory.

Highlighted Details

Offers over 50 distinct augmentation techniques, categorized into pixel-level and spatial-level effects.
Spatial augmentations can affect image, alpha layer, masks, keypoints, and bounding boxes.
Benchmarks on the Tobacco3482 dataset show varying performance impacts across augmentations, with some operations like Geometric and SectionShift achieving high image throughput.

Maintenance & Community

The project is maintained by The Augraphy Project.
Contributions via pull requests are welcome; feature requests should be discussed via issues.
BibTeX citations are provided for research use.

Licensing & Compatibility

Distributed under the MIT license.
Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

While extensive, the library's focus is on paper-oriented distortions; camera-phone distortions are a future consideration.
Performance varies significantly between augmentation types, with some being computationally intensive.

Health Check

Last Commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

6 stars in the last 30 days