aphantasia  by eps696

Text-to-image/video tool using CLIP and FFT/DWT/RGB, no GANs

created 4 years ago
789 stars

Top 45.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides text-to-image and text-to-video generation tools, leveraging CLIP and FFT/DWT/RGB parameterizations for no-GAN image synthesis. It targets artists and researchers seeking to create detailed textures, high-resolution images, and animated sequences from textual descriptions, offering advanced control over style, composition, and motion.

How It Works

The core of Aphantasia utilizes CLIP for understanding text prompts and the Lucent library for FFT/DWT/RGB parameterizations. This approach avoids GANs, generating images through iterative optimization of these parameters. It supports complex queries with weighted terms, style prompts, and negative prompts, and incorporates Depth Anything 2 for 3D effects. The system can also use LPIPS loss for image reproduction and offers various optimization techniques for stability and detail.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt and pip install git+https://github.com/openai/CLIP.git
  • Run image generation: python clip_fft.py -t "your text prompt" --size 1280-720
  • Requires Python 3.7-3.11 and PyTorch 1.7.1-2.3.1. GPU memory is a primary constraint.

Highlighted Details

  • Generates high-resolution images (4K+) and detailed textures.
  • Supports various CLIP models (ViT-B/32, ViT-B/16, RN50, etc.) and dual-model usage.
  • Offers continuous mode for illustrating lyrics and text-to-video generation with pan/zoom.
  • Integrates Depth Anything 2 for 3D visual effects.

Maintenance & Community

The project is based on OpenAI's CLIP and Lucent library. Credits are given to contributors for ideas and specific components. No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "evolved from artwork," suggesting it may not be a production-ready, fully polished application. Specific details on performance benchmarks or extensive user testing are not provided.

Health Check
Last commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.