Image composer using LLMs to generate code for image creation
Top 6.9% on sourcepulse
Omost is a project that leverages Large Language Models (LLMs) to translate coding capabilities into image composition. It targets users who want to generate images through detailed, structured descriptions, enabling precise control over visual elements and their arrangement. The core benefit is a more programmatic and controllable approach to image generation compared to traditional text-to-image models.
How It Works
Omost utilizes LLMs to generate Python code that interacts with a virtual "Canvas agent." This agent allows for granular control over image composition by defining global scene descriptions and adding local elements with specific locations, offsets, areas, and relative depths. The LLM-generated code is designed to be easily interpretable by diffusion models, facilitating precise image rendering. The project emphasizes a "sub-prompt" strategy for descriptions, breaking down complex prompts into smaller, self-contained units to improve LLM understanding and prevent semantic truncation during encoding.
Quick Start & Requirements
conda create -n omost python=3.10
), activate it (conda activate omost
), install PyTorch with CUDA 12.1 support (pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
), and install requirements (pip install -r requirements.txt
).python gradio_app.py
.bitsandbytes
, which can cause issues on older GPUs (9XX, 10XX, 20XX series).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
omost-phi-3-mini-128k
is unreliable beyond ~8k tokens.omost-phi-3-mini-128k
to 4 bits is not recommended due to performance degradation.omost-dolphin-2.9-llama3-8b
model is trained without NSFW filtering and requires user-applied safety alignment for public services.1 year ago
1 day