ZenCtrl  by FotographerAI

GenAI framework for subject-driven image generation

created 4 months ago
344 stars

Top 81.6% on sourcepulse

GitHubView on GitHub
Project Summary

ZenCtrl is a unified visual content creation framework designed for AI-driven image generation. It enables users to create high-resolution, multi-view, and diverse-scene images from a single subject image without requiring fine-tuning, targeting use cases like product photography, fashion, and advertising.

How It Works

ZenCtrl builds upon OmniControl, enhancing it with improved subject preservation and fine-grained control. It employs a modular approach, integrating preprocessing, control models (shape, pose, mask, camera), editing capabilities (inpainting, outpainting, relighting), and post-processing for tasks like deblurring and color correction. This allows for agentic task composition, orchestrating complex visual workflows.

Quick Start & Requirements

  • Install: Clone the repository, create a virtual environment, install PyTorch with CUDA 12.8 support, and run pip install -r requirements.txt.
  • Weights: Download model weights from Hugging Face: https://huggingface.co/fotographerai/zenctrl_tools.
  • Launch: Run python app/gradio_app.py.
  • Prerequisites: Python 3.x, PyTorch with CUDA 12.8, requirements.txt dependencies.
  • Resources: Requires downloading model weights.
  • Demo: Hugging Face Space available: https://huggingface.co/spaces/fotographerai/ZenCtrl.

Highlighted Details

  • Subject-driven generation with foreground fidelity preservation.
  • Supports shape, pose, mask, and camera control without fine-tuning.
  • Modular toolkit for preprocessing, editing, and post-processing.
  • Targets product photography, fashion try-on, and ad creatives.

Maintenance & Community

  • Active development with recent code and model weight releases.
  • Community Discord available for feedback and collaboration.
  • Future plans include video generation and expanded API access.

Licensing & Compatibility

  • The repository is open-source, with model weights available on Hugging Face. Specific license details are not explicitly stated in the README, but the emphasis on open-source collaboration suggests permissive licensing.

Limitations & Caveats

Current models perform best with objects and humans, with resolution capped at 1024x1024. Performance with illustrations is limited, and video generation is still under development. The models were not trained on large-scale, diverse datasets, impacting quality and variation for stylized content.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
271 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.