Awesome-Multimodal-Prompts by langgptai

Multimodal prompt collection for GPT-4V & DALL-E3

Created 2 years ago

284 stars

Top 92.0% on SourcePulse

Project Summary

This repository curates prompts for multimodal large language models (LLMs) like GPT-4V and DALL-E 3, enabling users to leverage their advanced vision and image generation capabilities. It serves as a resource for developers, researchers, and AI enthusiasts looking to explore and implement sophisticated multimodal interactions.

How It Works

The collection showcases prompt engineering techniques for various multimodal tasks. It categorizes prompts by application, including visual question answering, code generation from UI mockups, document analysis, and creative image synthesis with DALL-E 3. The prompts are designed to elicit specific, complex outputs by guiding the models through detailed instructions and contextual information.

Quick Start & Requirements

Access to GPT-4V and DALL-E 3 APIs or interfaces is required.
Clone the repository to access prompt examples.
Links to relevant papers and resources are provided within the README.

Highlighted Details

Demonstrates advanced GPT-4V capabilities like visual referring, prompt injection for CAPTCHA solving, and math formula recognition.
Features a wide array of DALL-E 3 prompt styles, from technical diagrams and pixel art to specific artistic styles and text generation.
Includes sections on video understanding and multimodal chain-of-thought prompting.
Lists and briefly describes other multimodal LLM projects like LLaVA and CogVLM.

Maintenance & Community

The repository is community-driven, with contributions and examples sourced from various online platforms and research papers. Links to relevant Twitter threads and papers are included for further exploration.

Licensing & Compatibility

The repository itself does not specify a license. The prompts are intended for use with OpenAI's models, subject to their respective terms of service.

Limitations & Caveats

This is a curated list of prompts and does not include the models themselves. The effectiveness of prompts is dependent on the specific capabilities and updates of the underlying multimodal models (GPT-4V, DALL-E 3). Some examples may require specific API access or versions.

Awesome-Multimodal-Prompts by langgptai

Explore Similar Projects

ShareGPT4V by ShareGPT4Omni

lens by ContextualAI

Lumina-mGPT by Alpha-VLLM

Awesome-Prompting-on-Vision-Language-Model by JindongGu

gpt-image-2-skill by UzenUPozitiv4ik

Osprey by CircleRadon

Awesome-GPT4o-Image-Prompts by ImgEdify

MM-REACT by microsoft

MiniGPT-4-ZH by RiseInRose

GLIGEN by gligen

Awesome-Text-to-Image by Yutong-Zhou-cv

awesome-gpt4o-images by jamez-bondos