augraphy  by sparkfish

Python library for document image augmentation

Created 4 years ago
458 stars

Top 66.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Augraphy is a Python library designed to generate realistic synthetic document distortions simulating paper printing, faxing, scanning, and copying processes. It targets AI/ML researchers and engineers working on document analysis tasks like OCR, form recognition, and document restoration, enabling the creation of large, diverse training datasets from clean source documents.

How It Works

Augraphy employs a pipeline-based approach. It first extracts text and graphics ("ink") from a clean document, then applies a series of distortions to this ink layer. Simultaneously, a "paper factory" provides a base paper layer, which can also undergo distortions. The processed ink and paper layers are merged, and further augmentations like folds or physical deformations are applied. This layered, multi-stage process allows for the creation of highly varied and realistic degraded document images.

Quick Start & Requirements

  • Install via pip: pip install augraphy
  • Requires Python 3.7+ and OpenCV (opencv-python).
  • Example usage:
from augraphy import *
pipeline = default_augraphy_pipeline()
augmented = pipeline(image)
  • Full documentation is available in the doc directory.

Highlighted Details

  • Offers over 50 distinct augmentation techniques, categorized into pixel-level and spatial-level effects.
  • Spatial augmentations can affect image, alpha layer, masks, keypoints, and bounding boxes.
  • Benchmarks on the Tobacco3482 dataset show varying performance impacts across augmentations, with some operations like Geometric and SectionShift achieving high image throughput.

Maintenance & Community

  • The project is maintained by The Augraphy Project.
  • Contributions via pull requests are welcome; feature requests should be discussed via issues.
  • BibTeX citations are provided for research use.

Licensing & Compatibility

  • Distributed under the MIT license.
  • Permissive license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

  • While extensive, the library's focus is on paper-oriented distortions; camera-phone distortions are a future consideration.
  • Performance varies significantly between augmentation types, with some being computationally intensive.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Robin Huang Robin Huang(Cofounder of Comfy Org), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
17 more.

stablediffusion by Stability-AI

0.1%
42k
Latent diffusion model for high-resolution image synthesis
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.