deep-daze  by lucidrains

CLI tool for text-to-image generation using CLIP and SIREN

created 4 years ago
4,354 stars

Top 11.4% on sourcepulse

GitHubView on GitHub
Project Summary

Deep Daze is a command-line tool for text-to-image generation, leveraging OpenAI's CLIP and Siren (Implicit Neural Representation Networks). It allows users to create visual art from textual prompts, offering flexibility for both simple phrases and longer narratives, and is suitable for artists, researchers, and hobbyists interested in AI-driven creative tools.

How It Works

The tool combines CLIP for understanding text-image relationships with Siren, a neural network architecture optimized for high-frequency details, to generate images. This approach allows for detailed and nuanced visual interpretations of text prompts, with adjustable parameters like the number of layers and learning rate to control the output quality and complexity.

Quick Start & Requirements

  • Install via pip: pip install deep-daze
  • Requires an NVIDIA or AMD GPU. Recommended: 16GB VRAM. Minimum: 4GB VRAM (with very low settings).
  • Usage: imagine "your text prompt"
  • Official Notebooks: Original, Simplified

Highlighted Details

  • Supports image-to-image generation and "priming" with a starting image.
  • Includes a create_story mode for visualizing longer texts sequentially.
  • Offers extensive CLI arguments for fine-tuning generation parameters (e.g., num_layers, batch_size, image_width).
  • Provides VRAM and speed benchmarks for various configurations.

Maintenance & Community

  • Developed by lucidrains, a prolific contributor in the AI/ML space.
  • Project activity and community support can be gauged via GitHub issues and pull requests.

Licensing & Compatibility

  • The project appears to be MIT licensed, allowing for broad use and modification, including commercial applications.

Limitations & Caveats

  • Performance is heavily dependent on GPU VRAM, with lower-end cards requiring significant parameter tuning.
  • The create_story mode's effectiveness with very long texts may vary.
  • While powerful, the tool requires experimentation with parameters to achieve desired results.
Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.