detextify  by iuliaturc

Python library for removing text artifacts from AI-generated images

created 2 years ago
295 stars

Top 90.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Detextify is a Python library designed to remove unwanted text artifacts from images generated by AI models like Stable Diffusion, Midjourney, and DALL-E. It targets users and developers working with generative AI who need to improve the quality and usability of AI-generated images by cleaning up common text-based distortions.

How It Works

The library employs a two-stage process: text detection and in-painting. It first identifies and masks text regions within an image using a chosen text detection method (e.g., Tesseract or Azure Computer Vision API). Subsequently, it uses an in-painting model (e.g., Stable Diffusion, OpenAI DALL-E 2, or Replicate API) to fill in the masked areas, effectively removing the text and reconstructing the background. This modular approach allows flexibility in choosing between local processing or API-based services for both detection and in-painting.

Quick Start & Requirements

  • Install via pip: pip install detextify
  • Local execution requires Tesseract OCR for text detection.
  • Local in-painting requires a GPU with CUDA and cuDNN installed, defaulting to Stable Diffusion v2.
  • API-based options require relevant API keys (Azure, OpenAI, Replicate).
  • See the Colab notebook for usage examples.

Highlighted Details

  • Supports multiple text detection backends: Tesseract (local) and Azure Computer Vision (API).
  • Offers various in-painting options: Local Stable Diffusion (GPU required), Replicate API, and OpenAI DALL-E 2 API.
  • Provides a clear Python API for programmatic use and batch processing of images.
  • Designed with extensibility to add custom text detectors and in-painters.

Maintenance & Community

Authored by Mihail Eric and Julia Turc. The project encourages contributions via pull requests after cloning and setting up with Poetry.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Local in-painting is computationally intensive and requires specific GPU hardware and CUDA setup. The effectiveness of text removal depends on the quality of the text detection and the in-painting model's ability to reconstruct the background.

Health Check
Last commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
created 2 years ago
updated 1 year ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Feedback? Help us improve.