CBIT-AiStudio  by reneverland

Enterprise-grade AI image generation platform for photorealistic humans

Created 4 months ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides an enterprise-level AI image generation platform, BaiduCBIT, built on ComfyUI, specializing in photorealistic human image synthesis. It targets users requiring high-fidelity visual content, offering an end-to-end solution from text prompts to detailed, realistic images through its advanced Flux model architecture and integrated AI enhancement technologies. The platform aims to streamline complex generation workflows with a user-friendly interface and robust backend.

How It Works

BaiduCBIT leverages the Flux 1.0 Dev Model, a 12B parameter Diffusion Transformer architecture, enhanced with FP8_E4M3FN quantization for memory efficiency. It employs a Dual-CLIP architecture for robust text understanding and integrates specialized LoRA modules for fine-tuning details like hands and realism. Generation utilizes a dual-stage sampling strategy with adaptive guidance and ControlNet for precise control. A multi-level post-processing pipeline refines output for photographic quality. The system supports both production-ready distributed deployment and local development environments.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies (pip install -r requirements.txt), configure environment variables (cp env.example .env), and start the service (python run.py).
  • Prerequisites: Linux (Ubuntu 20.04+ recommended), Python 3.12+, NVIDIA GPU (RTX 3080+ with 8GB+ VRAM), CUDA 12.4+, 16GB+ RAM, 100GB+ storage.
  • Links: GitHub Repository.

Highlighted Details

  • Achieves photorealistic human image generation with detailed skin, facial features, hair, and lighting.
  • Includes specialized LoRA modules for hand repair and realism enhancement.
  • Supports multilingual prompt translation via Bing Translator API.
  • Offers precise control over generation using various ControlNet types (Tile, Depth, Pose, Edge).
  • Optimized for performance with FP8 quantization, achieving ~15-30 second generation times on an RTX 4090 for 30 steps.

Maintenance & Community

  • Maintainer: BaiduCBIT Team.
  • Developer: @reneverland.
  • Support: Primarily through GitHub Issues.
  • Roadmap: Features in development include video generation and batch processing optimization. Planned features include support for SDXL/SD3.5, real-time preview, and user permission management.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is permissive, generally allowing commercial use and integration into closed-source projects.

Limitations & Caveats

The project is actively under development, with key features like video generation and broader model support (SDXL, SD3.5) still in the planning or development stages. A specific CUDA version (12.4+) is required. The demo video is not included in the repository.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

IP-Adapter by tencent-ailab

0.1%
6k
Adapter for image prompt in text-to-image diffusion models
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
12 more.

IF by deep-floyd

0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.