Text-to-image/video tool using CLIP and FFT/DWT/RGB, no GANs
Top 45.3% on sourcepulse
This project provides text-to-image and text-to-video generation tools, leveraging CLIP and FFT/DWT/RGB parameterizations for no-GAN image synthesis. It targets artists and researchers seeking to create detailed textures, high-resolution images, and animated sequences from textual descriptions, offering advanced control over style, composition, and motion.
How It Works
The core of Aphantasia utilizes CLIP for understanding text prompts and the Lucent library for FFT/DWT/RGB parameterizations. This approach avoids GANs, generating images through iterative optimization of these parameters. It supports complex queries with weighted terms, style prompts, and negative prompts, and incorporates Depth Anything 2 for 3D effects. The system can also use LPIPS loss for image reproduction and offers various optimization techniques for stability and detail.
Quick Start & Requirements
pip install -r requirements.txt
and pip install git+https://github.com/openai/CLIP.git
python clip_fft.py -t "your text prompt" --size 1280-720
Highlighted Details
Maintenance & Community
The project is based on OpenAI's CLIP and Lucent library. Credits are given to contributors for ideas and specific components. No explicit community links (Discord/Slack) or roadmap are provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is described as "evolved from artwork," suggesting it may not be a production-ready, fully polished application. Specific details on performance benchmarks or extensive user testing are not provided.
5 months ago
Inactive