Document-illustrator-skill  by op7418

AI document-to-image generator

Created 1 month ago
284 stars

Top 92.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an AI-powered tool for generating custom illustrations from documents, designed for users of Claude Code, content creators, and technical writers. It automates the creation of professional, contextually relevant images, supporting various styles and aspect ratios, thereby streamlining the visual content creation process for articles, reports, and social media posts.

How It Works

The system operates as a Claude Code Skill. It ingests documents in any format (Markdown, TXT, PDF), uses AI to semantically understand and summarize core themes, and presents these to the user for confirmation. Upon approval, it leverages the Gemini API to generate images, offering distinct artistic styles like gradient glass, ticket, and vector illustration, with flexible aspect ratio options. This AI-driven approach ensures content relevance and user control over the summarization process, differentiating it from traditional format-dependent parsers.

Quick Start & Requirements

  • Installation: Recommended: npx skills add https://github.com/op7418/Document-illustrator-skill. Manual installation involves cloning the repository into the Claude Skills directory.
  • Prerequisites:
    • Gemini API Key obtained from Google AI Studio.
    • Python 3.8+
    • Python dependencies: google-genai, pillow, python-dotenv.
    • Claude Code environment.
  • Verification: Run python3 scripts/generate_single_image.py --help.
  • Documentation: Links to Google AI Studio, GitHub Issues, and Discussions are available.

Highlighted Details

  • AI-driven content summarization and theme extraction, independent of document format.
  • Three distinct visual styles: Gradient Glass (minimalist, glassmorphism), Ticket (minimalist, high-contrast), and Vector Illustration (flat, retro).
  • Supports 16:9 (landscape) and 3:4 (portrait) aspect ratios.
  • Optional cover image generation to summarize the entire document.
  • User confirmation step for summarized content before image generation.

Maintenance & Community

The project primarily uses GitHub Issues and Discussions for community interaction and support. Specific details on maintainers, contributors, or sponsorships are not explicitly listed in the README.

Licensing & Compatibility

The project is released under the MIT License, permitting free use for commercial and non-commercial purposes, modification, and distribution, provided the original license and copyright notice are included.

Limitations & Caveats

Image generation relies on the Gemini API, incurring per-image costs and potential for API-related failures (network, quota, service availability). The system truncates document input to the first 1000 characters. Batch processing is not natively supported and requires custom scripting. High-resolution (4K) generation increases processing time and API costs.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
51 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.