GPT4V-Image-Captioner  by jiayev

Image processing toolbox for image tagging

created 1 year ago
840 stars

Top 43.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a versatile, Gradio-based image captioning and tagging toolbox for researchers and power users. It simplifies the process of generating descriptive tags and captions for images using various large vision-language models, including GPT-4-vision, Claude 3, CogVLM, Qwen-VL, and Moondream.

How It Works

The application leverages multiple state-of-the-art vision-language models (VLMs) to analyze images and generate relevant tags and captions. Users can choose between cloud-based APIs (OpenAI's GPT-4-vision, Claude 3, Alibaba's Qwen-VL) or locally hosted models (CogVLM, Moondream). This flexibility allows users to balance cost, performance, and data privacy based on their specific needs. The Gradio interface provides a user-friendly way to manage single-image or batch processing, with additional features like visual tag analysis and keyword filtering.

Quick Start & Requirements

  • Installation: Run install_windows.bat (Windows) or ./install_linux_mac.sh (Linux/macOS) after cloning the repository.
  • Prerequisites: Python 3.x, Git. API keys for cloud-based models (OpenAI, Claude 3, Qwen-VL).
  • Launch: Run start_windows.bat (Windows) or ./start_linux_mac.sh (Linux/macOS).
  • Docs: https://github.com/jiayev/GPT4V-Image-Captioner

Highlighted Details

  • Supports multiple VLMs: GPT-4-vision, Claude 3, CogVLM, Qwen-VL, Moondream.
  • Batch processing for multiple images.
  • Visual tag analysis and keyword filtering.
  • Image pre-compression and watermark recognition.

Maintenance & Community

The project welcomes community contributions for new features. The Claude 3 feature is noted as unfinished.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Users should verify licensing for commercial use or integration with closed-source projects.

Limitations & Caveats

The README indicates that the Claude 3 integration is not yet complete. Users will need to manually configure API keys and endpoints for cloud-based models.

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
19 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.