Document-illustrator-skill by op7418

AI document-to-image generator

Created 4 months ago

544 stars

Top 58.2% on SourcePulse

Project Summary

This project provides an AI-powered tool for generating custom illustrations from documents, designed for users of Claude Code, content creators, and technical writers. It automates the creation of professional, contextually relevant images, supporting various styles and aspect ratios, thereby streamlining the visual content creation process for articles, reports, and social media posts.

How It Works

The system operates as a Claude Code Skill. It ingests documents in any format (Markdown, TXT, PDF), uses AI to semantically understand and summarize core themes, and presents these to the user for confirmation. Upon approval, it leverages the Gemini API to generate images, offering distinct artistic styles like gradient glass, ticket, and vector illustration, with flexible aspect ratio options. This AI-driven approach ensures content relevance and user control over the summarization process, differentiating it from traditional format-dependent parsers.

Quick Start & Requirements

Installation: Recommended: npx skills add https://github.com/op7418/Document-illustrator-skill. Manual installation involves cloning the repository into the Claude Skills directory.
Prerequisites:
- Gemini API Key obtained from Google AI Studio.
- Python 3.8+
- Python dependencies: google-genai, pillow, python-dotenv.
- Claude Code environment.
Verification: Run python3 scripts/generate_single_image.py --help.
Documentation: Links to Google AI Studio, GitHub Issues, and Discussions are available.

Highlighted Details

AI-driven content summarization and theme extraction, independent of document format.
Three distinct visual styles: Gradient Glass (minimalist, glassmorphism), Ticket (minimalist, high-contrast), and Vector Illustration (flat, retro).
Supports 16:9 (landscape) and 3:4 (portrait) aspect ratios.
Optional cover image generation to summarize the entire document.
User confirmation step for summarized content before image generation.

Maintenance & Community

The project primarily uses GitHub Issues and Discussions for community interaction and support. Specific details on maintainers, contributors, or sponsorships are not explicitly listed in the README.

Licensing & Compatibility

The project is released under the MIT License, permitting free use for commercial and non-commercial purposes, modification, and distribution, provided the original license and copyright notice are included.

Limitations & Caveats

Image generation relies on the Gemini API, incurring per-image costs and potential for API-related failures (network, quota, service availability). The system truncates document input to the first 1000 characters. Batch processing is not natively supported and requires custom scripting. High-resolution (4K) generation increases processing time and API costs.

Document-illustrator-skill by op7418

Explore Similar Projects

cc-nano-banana by kkoppenhaver

paper-framework-figure-studio-pro by c-narcissus

long_stable_diffusion by sharonzhou

paper-ppt-agent by CRui5in

smart-illustrator by axtonliu

academic-figure-generator by LigphiDonk

papervizagent by google-research

ian-handdrawn-ppt by helloianneo

banana-claude by AgriciDaniel

ian-xiaohei-illustrations by helloianneo

gpt4o-image-prompts by songguoxs

awesome-nanobanana-pro by ZeroLu