GPT4V-Image-Captioner by jiayev

Image processing toolbox for image tagging

Created 2 years ago

854 stars

Top 41.7% on SourcePulse

Project Summary

This project provides a versatile, Gradio-based image captioning and tagging toolbox for researchers and power users. It simplifies the process of generating descriptive tags and captions for images using various large vision-language models, including GPT-4-vision, Claude 3, CogVLM, Qwen-VL, and Moondream.

How It Works

The application leverages multiple state-of-the-art vision-language models (VLMs) to analyze images and generate relevant tags and captions. Users can choose between cloud-based APIs (OpenAI's GPT-4-vision, Claude 3, Alibaba's Qwen-VL) or locally hosted models (CogVLM, Moondream). This flexibility allows users to balance cost, performance, and data privacy based on their specific needs. The Gradio interface provides a user-friendly way to manage single-image or batch processing, with additional features like visual tag analysis and keyword filtering.

Quick Start & Requirements

Installation: Run install_windows.bat (Windows) or ./install_linux_mac.sh (Linux/macOS) after cloning the repository.
Prerequisites: Python 3.x, Git. API keys for cloud-based models (OpenAI, Claude 3, Qwen-VL).
Launch: Run start_windows.bat (Windows) or ./start_linux_mac.sh (Linux/macOS).
Docs: https://github.com/jiayev/GPT4V-Image-Captioner

Highlighted Details

Supports multiple VLMs: GPT-4-vision, Claude 3, CogVLM, Qwen-VL, Moondream.
Batch processing for multiple images.
Visual tag analysis and keyword filtering.
Image pre-compression and watermark recognition.

Maintenance & Community

The project welcomes community contributions for new features. The Claude 3 feature is noted as unfinished.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The README indicates that the Claude 3 integration is not yet complete. Users will need to manually configure API keys and endpoints for cloud-based models.

GPT4V-Image-Captioner by jiayev

Explore Similar Projects

NextStep-1 by stepfun-ai

ComfyUI-OmniGen by 1038lab

Comfyui_CXH_joy_caption by StartHua

Semi-Auto-NovelAI-to-Pixiv by zhulinyv

taggui by jhc13

rclip by yurijmikhalevich

BLIP3o by JiuhaiChen

rawpy by letmaik

Ovis by AIDC-AI

stable-diffusion-webui-wd14-tagger by kawalain

StableCascade by Stability-AI

stable-diffusion by CompVis