deep-daze by lucidrains

CLI tool for text-to-image generation using CLIP and SIREN

Created 5 years ago

4,329 stars

Top 11.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Phil Wang

Prolific Research Paper Implementer

Luis Capelo

Cofounder of Lightning AI

Paras Jain

Cofounder of Genmo

Johannes Hagemann

Cofounder of Prime Intellect

and 2 more!

Project Summary

Deep Daze is a command-line tool for text-to-image generation, leveraging OpenAI's CLIP and Siren (Implicit Neural Representation Networks). It allows users to create visual art from textual prompts, offering flexibility for both simple phrases and longer narratives, and is suitable for artists, researchers, and hobbyists interested in AI-driven creative tools.

How It Works

The tool combines CLIP for understanding text-image relationships with Siren, a neural network architecture optimized for high-frequency details, to generate images. This approach allows for detailed and nuanced visual interpretations of text prompts, with adjustable parameters like the number of layers and learning rate to control the output quality and complexity.

Quick Start & Requirements

Install via pip: pip install deep-daze
Requires an NVIDIA or AMD GPU. Recommended: 16GB VRAM. Minimum: 4GB VRAM (with very low settings).
Usage: imagine "your text prompt"
Official Notebooks: Original, Simplified

Highlighted Details

Supports image-to-image generation and "priming" with a starting image.
Includes a create_story mode for visualizing longer texts sequentially.
Offers extensive CLI arguments for fine-tuning generation parameters (e.g., num_layers, batch_size, image_width).
Provides VRAM and speed benchmarks for various configurations.

Maintenance & Community

Developed by lucidrains, a prolific contributor in the AI/ML space.
Project activity and community support can be gauged via GitHub issues and pull requests.

Licensing & Compatibility

The project appears to be MIT licensed, allowing for broad use and modification, including commercial applications.

Limitations & Caveats

Performance is heavily dependent on GPU VRAM, with lower-end cards requiring significant parameter tuning.
The create_story mode's effectiveness with very long texts may vary.
While powerful, the tool requires experimentation with parameters to achieve desired results.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days