CBIT-AiStudio  by reneverland

Enterprise-grade AI image generation platform for photorealistic humans

Created 3 months ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an enterprise-level AI image generation platform, BaiduCBIT, built on ComfyUI, specializing in photorealistic human image synthesis. It targets users requiring high-fidelity visual content, offering an end-to-end solution from text prompts to detailed, realistic images through its advanced Flux model architecture and integrated AI enhancement technologies. The platform aims to streamline complex generation workflows with a user-friendly interface and robust backend.

How It Works

BaiduCBIT leverages the Flux 1.0 Dev Model, a 12B parameter Diffusion Transformer architecture, enhanced with FP8_E4M3FN quantization for memory efficiency. It employs a Dual-CLIP architecture for robust text understanding and integrates specialized LoRA modules for fine-tuning details like hands and realism. Generation utilizes a dual-stage sampling strategy with adaptive guidance and ControlNet for precise control. A multi-level post-processing pipeline refines output for photographic quality. The system supports both production-ready distributed deployment and local development environments.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies (pip install -r requirements.txt), configure environment variables (cp env.example .env), and start the service (python run.py).
  • Prerequisites: Linux (Ubuntu 20.04+ recommended), Python 3.12+, NVIDIA GPU (RTX 3080+ with 8GB+ VRAM), CUDA 12.4+, 16GB+ RAM, 100GB+ storage.
  • Links: GitHub Repository.

Highlighted Details

  • Achieves photorealistic human image generation with detailed skin, facial features, hair, and lighting.
  • Includes specialized LoRA modules for hand repair and realism enhancement.
  • Supports multilingual prompt translation via Bing Translator API.
  • Offers precise control over generation using various ControlNet types (Tile, Depth, Pose, Edge).
  • Optimized for performance with FP8 quantization, achieving ~15-30 second generation times on an RTX 4090 for 30 steps.

Maintenance & Community

  • Maintainer: BaiduCBIT Team.
  • Developer: @reneverland.
  • Support: Primarily through GitHub Issues.
  • Roadmap: Features in development include video generation and batch processing optimization. Planned features include support for SDXL/SD3.5, real-time preview, and user permission management.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is permissive, generally allowing commercial use and integration into closed-source projects.

Limitations & Caveats

The project is actively under development, with key features like video generation and broader model support (SDXL, SD3.5) still in the planning or development stages. A specific CUDA version (12.4+) is required. The demo video is not included in the repository.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
111 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.2%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
12 more.

IF by deep-floyd

0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.