ZenCtrl by FotographerAI

GenAI framework for subject-driven image generation

Created 9 months ago

350 stars

Top 79.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

ZenCtrl is a unified visual content creation framework designed for AI-driven image generation. It enables users to create high-resolution, multi-view, and diverse-scene images from a single subject image without requiring fine-tuning, targeting use cases like product photography, fashion, and advertising.

How It Works

ZenCtrl builds upon OmniControl, enhancing it with improved subject preservation and fine-grained control. It employs a modular approach, integrating preprocessing, control models (shape, pose, mask, camera), editing capabilities (inpainting, outpainting, relighting), and post-processing for tasks like deblurring and color correction. This allows for agentic task composition, orchestrating complex visual workflows.

Quick Start & Requirements

Install: Clone the repository, create a virtual environment, install PyTorch with CUDA 12.8 support, and run pip install -r requirements.txt.
Weights: Download model weights from Hugging Face: https://huggingface.co/fotographerai/zenctrl_tools.
Launch: Run python app/gradio_app.py.
Prerequisites: Python 3.x, PyTorch with CUDA 12.8, requirements.txt dependencies.
Resources: Requires downloading model weights.
Demo: Hugging Face Space available: https://huggingface.co/spaces/fotographerai/ZenCtrl.

Highlighted Details

Subject-driven generation with foreground fidelity preservation.
Supports shape, pose, mask, and camera control without fine-tuning.
Modular toolkit for preprocessing, editing, and post-processing.
Targets product photography, fashion try-on, and ad creatives.

Maintenance & Community

Active development with recent code and model weight releases.
Community Discord available for feedback and collaboration.
Future plans include video generation and expanded API access.

Licensing & Compatibility

The repository is open-source, with model weights available on Hugging Face. Specific license details are not explicitly stated in the README, but the emphasis on open-source collaboration suggests permissive licensing.

Limitations & Caveats

Current models perform best with objects and humans, with resolution capped at 1024x1024. Performance with illustrations is limited, and video generation is still under development. The models were not trained on large-scale, diverse datasets, impacting quality and variation for stylized content.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days