joycaption  by fpgaminer

Image captioning VLM for diffusion model training, aiming for uncensored, open use

created 9 months ago
733 stars

Top 48.2% on sourcepulse

GitHubView on GitHub
Project Summary

JoyCaption is an open-source Visual Language Model (VLM) designed for generating uncensored image captions, primarily aimed at users training diffusion models. It offers broad content and style coverage, including NSFW concepts, and provides detailed training scripts for community use.

How It Works

JoyCaption is built upon the Llama 3.1 architecture, fine-tuned for image captioning. It leverages a multimodal approach, processing both image and text inputs to generate descriptive captions. The model is designed to be uncensored and aims to match or exceed the performance of proprietary models like GPT-4o in captioning quality, particularly outside the SFW domain.

Quick Start & Requirements

  • Install/Run: Load model via Hugging Face transformers library.
  • Prerequisites: Python, PyTorch (bfloat16 support), transformers library. GPU recommended for inference.
  • Demo: Available on HuggingFace Spaces.
  • Docs: Example usage and prompt details provided in the README.

Highlighted Details

  • Uncensored: Explicitly trained to cover NSFW concepts without filtering.
  • Versatile Prompting: Supports multiple captioning styles including descriptive, Stable Diffusion prompts, MidJourney prompts, Booru tags, and art critiques.
  • Community Focus: Aims to be free, open-source, with released training scripts and detailed build information.
  • Performance Goal: Targets near GPT-4o captioning performance.

Maintenance & Community

The project is actively developed, currently at "Alpha Two." Feedback and contributions are encouraged. Release history and announcements are linked via Reddit and Civitai.

Licensing & Compatibility

The model weights are released under an open, free license with no restrictions. Compatibility for commercial use or closed-source linking is implied by the "no restrictions" claim.

Limitations & Caveats

JoyCaption is an experimental alpha release and not production-ready. Known limitations include potential issues with character interactions, OCR, and left/right confusion. The model is heavily optimized for specific prompt formats, and results may vary with general instructions.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
3
Star History
303 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.