aphantasia by eps696

Text-to-image/video tool using CLIP and FFT/DWT/RGB, no GANs

Created 4 years ago

789 stars

Top 44.6% on SourcePulse

Project Summary

This project provides text-to-image and text-to-video generation tools, leveraging CLIP and FFT/DWT/RGB parameterizations for no-GAN image synthesis. It targets artists and researchers seeking to create detailed textures, high-resolution images, and animated sequences from textual descriptions, offering advanced control over style, composition, and motion.

How It Works

The core of Aphantasia utilizes CLIP for understanding text prompts and the Lucent library for FFT/DWT/RGB parameterizations. This approach avoids GANs, generating images through iterative optimization of these parameters. It supports complex queries with weighted terms, style prompts, and negative prompts, and incorporates Depth Anything 2 for 3D effects. The system can also use LPIPS loss for image reproduction and offers various optimization techniques for stability and detail.

Quick Start & Requirements

Install dependencies: pip install -r requirements.txt and pip install git+https://github.com/openai/CLIP.git
Run image generation: python clip_fft.py -t "your text prompt" --size 1280-720
Requires Python 3.7-3.11 and PyTorch 1.7.1-2.3.1. GPU memory is a primary constraint.

Highlighted Details

Generates high-resolution images (4K+) and detailed textures.
Supports various CLIP models (ViT-B/32, ViT-B/16, RN50, etc.) and dual-model usage.
Offers continuous mode for illustrating lyrics and text-to-video generation with pan/zoom.
Integrates Depth Anything 2 for 3D visual effects.

Maintenance & Community

The project is based on OpenAI's CLIP and Lucent library. Credits are given to contributors for ideas and specific components. No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "evolved from artwork," suggesting it may not be a production-ready, fully polished application. Specific details on performance benchmarks or extensive user testing are not provided.

aphantasia by eps696

Explore Similar Projects

PixelOE by KohakuBlueleaf

UltraPixel by catcathh

f-lite by fal-ai

sd-webui-image-sequence-toolkit by OedoSoldier

kandinsky-5 by kandinskylab

WonderJourney by KovenYu

guizang-s-prompt by op7418

Lumina-T2X by Alpha-VLLM

Qwen-Image by QwenLM

VQGAN-CLIP by nerdyrodent

sygil-webui by Sygil-Dev

Wan2.2 by Wan-Video