big-sleep  by lucidrains

CLI tool for text-to-image generation

created 4 years ago
2,570 stars

Top 18.6% on sourcepulse

GitHubView on GitHub
Project Summary

Big Sleep is a command-line tool and Python library for generating images from text prompts using OpenAI's CLIP and a BigGAN. It's designed for users with GPUs who want to experiment with text-to-image synthesis through a simple interface.

How It Works

The tool leverages CLIP to interpret text prompts and guide a BigGAN generator towards producing corresponding images. This approach allows for creative image generation by "dreaming" visuals based on natural language descriptions, offering a straightforward way to explore AI-powered art.

Quick Start & Requirements

  • Install via pip: pip install big-sleep
  • Requires a GPU.
  • Usage: dream "a pyramid made of ice"
  • Advanced usage and code examples are available in the README.

Highlighted Details

  • Supports training on multiple phrases using a "|" delimiter.
  • Allows penalizing specific prompts to steer generation away from unwanted elements.
  • Option to use a larger OpenAI vision model (--larger-model) for potentially improved generations.
  • Can save the best high-scoring image during generation (--save-best).

Maintenance & Community

The project is based on work by Ryan Murdock and is available on GitHub. Links to original and simplified notebooks are provided.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that Big Sleep can sometimes steer off-manifold into noise due to the class-conditioned nature of the GAN. The --max-classes flag is suggested for stability at the cost of expressivity.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
4 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
created 2 years ago
updated 11 months ago
Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
2 more.

glide-text2im by openai

0.1%
4k
Text-conditional image synthesis model from research paper
created 3 years ago
updated 1 year ago
Feedback? Help us improve.