comfyui_dagthomas  by dagthomas

ComfyUI extension for advanced prompt/image processing

Created 2 years ago
264 stars

Top 96.8% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI extension provides advanced prompt generation and image analysis capabilities, targeting users who want to enhance their AI image creation workflows. It offers nodes for GPT-4 powered text generation, image description via GPT-4 Vision, local LLM integration with Ollama, and sophisticated prompt structuring with dynamic category-based generation.

How It Works

The extension introduces several custom nodes. PromptGenerator and APNextNode allow for structured and randomized prompt creation, pulling elements from user-defined JSON files organized into categories. GPT4VisionNode leverages GPT-4 Vision to analyze images and generate detailed descriptions, with options for output detail and length. GPT4MiniNode and OllamaNode provide text generation capabilities using OpenAI's GPT-4 and local Ollama models, respectively, supporting custom base prompts and output formatting. A PGSD3LatentGenerator is also included for Stable Diffusion 3 latent creation.

Quick Start & Requirements

  • Installation: Add the repository to your ComfyUI custom nodes directory.
  • OpenAI API Key: Required for GPT4VisionNode and GPT4MiniNode. Set as an environment variable: OPENAI_API_KEY=sk-your-api-key-here.
  • Ollama: Required for OllamaNode.
  • Dependencies: Additional Python packages as specified in import statements.
  • Custom Categories: Create JSON files in comfyui_dagthomas/data/next/[CATEGORY_NAME]/ for APNextNode customization.
  • Example Workflow: Download apntest.json.
  • Documentation: Detailed documentation is in progress.

Highlighted Details

  • Dynamic Prompt Generation: APNextNode allows users to define custom categories and fields via JSON files, enabling highly flexible and repeatable prompt construction.
  • GPT-4 Vision Integration: Enables image-to-text analysis for detailed image descriptions, useful for prompt seeding or content moderation.
  • Local LLM Support: OllamaNode allows integration with local language models, offering an alternative to cloud-based APIs.
  • SD3 Latent Generation: Includes a node specifically for generating latents compatible with Stable Diffusion 3 pipelines.

Maintenance & Community

  • The project is marked as "beta" with documentation in progress.
  • No specific community links (Discord, Slack) or notable contributors are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

  • The project is in beta, and detailed documentation is still being developed.
  • Functionality relies heavily on correctly structured JSON files for custom categories.
  • Usage of OpenAI nodes requires an API key and incurs costs.
Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.