joycaption  by fpgaminer

Image captioning VLM for diffusion model training, aiming for uncensored, open use

Created 11 months ago
833 stars

Top 42.7% on SourcePulse

GitHubView on GitHub
Project Summary

JoyCaption is an open-source Visual Language Model (VLM) designed for generating uncensored image captions, primarily aimed at users training diffusion models. It offers broad content and style coverage, including NSFW concepts, and provides detailed training scripts for community use.

How It Works

JoyCaption is built upon the Llama 3.1 architecture, fine-tuned for image captioning. It leverages a multimodal approach, processing both image and text inputs to generate descriptive captions. The model is designed to be uncensored and aims to match or exceed the performance of proprietary models like GPT-4o in captioning quality, particularly outside the SFW domain.

Quick Start & Requirements

  • Install/Run: Load model via Hugging Face transformers library.
  • Prerequisites: Python, PyTorch (bfloat16 support), transformers library. GPU recommended for inference.
  • Demo: Available on HuggingFace Spaces.
  • Docs: Example usage and prompt details provided in the README.

Highlighted Details

  • Uncensored: Explicitly trained to cover NSFW concepts without filtering.
  • Versatile Prompting: Supports multiple captioning styles including descriptive, Stable Diffusion prompts, MidJourney prompts, Booru tags, and art critiques.
  • Community Focus: Aims to be free, open-source, with released training scripts and detailed build information.
  • Performance Goal: Targets near GPT-4o captioning performance.

Maintenance & Community

The project is actively developed, currently at "Alpha Two." Feedback and contributions are encouraged. Release history and announcements are linked via Reddit and Civitai.

Licensing & Compatibility

The model weights are released under an open, free license with no restrictions. Compatibility for commercial use or closed-source linking is implied by the "no restrictions" claim.

Limitations & Caveats

JoyCaption is an experimental alpha release and not production-ready. Known limitations include potential issues with character interactions, OCR, and left/right confusion. The model is heavily optimized for specific prompt formats, and results may vary with general instructions.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
3
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Douwe Kiela Douwe Kiela(Cofounder of Contextual AI), and
1 more.

lens by ContextualAI

0.3%
353
Vision-language research paper using LLMs
Created 2 years ago
Updated 1 month ago
Starred by Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), Travis Fischer Travis Fischer(Founder of Agentic), and
5 more.

fromage by kohjingyu

0%
482
Multimodal model for grounding language models to images
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

CLIP_prefix_caption by rmokady

0.1%
1k
Image captioning model using CLIP embeddings as a prefix
Created 4 years ago
Updated 1 year ago
Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.