Hotshot-XL  by hotshotco

Text-to-GIF model for Stable Diffusion XL

created 1 year ago
1,103 stars

Top 35.3% on sourcepulse

GitHubView on GitHub
Project Summary

Hotshot-XL is an AI model designed for generating GIFs from text prompts, leveraging the power of Stable Diffusion XL (SDXL). It allows users to create animated content using any fine-tuned SDXL model or their own LoRAs, offering flexibility for personalized subjects and existing workflows. The primary benefit is enabling text-to-GIF generation with advanced control and compatibility with SDXL's ecosystem.

How It Works

Hotshot-XL integrates temporal layers with SDXL's image generation capabilities. It was trained to produce 1-second GIFs at 8 FPS, focusing on efficiency around 512x512 resolutions across various aspect ratios. This approach allows it to work seamlessly with SDXL's architecture, including ControlNet for compositional control, and supports custom LoRAs without requiring re-training of the core Hotshot-XL model.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires git-lfs for downloading model weights.
  • Recommended to use with an SDXL model fine-tuned at 512x512 resolution (e.g., hotshotco/SDXL-512).
  • Inference example: python inference.py --prompt="a bulldog in the captains chair of a spaceship, hd, high quality" --output="output.gif"
  • Official Try It page: https://www.hotshot.co

Highlighted Details

  • Compatible with SDXL ControlNet for layout control.
  • Supports text-to-GIF generation with personalized LoRAs.
  • Allows varying aspect ratios and frame rates (experimental).
  • Can generate single images by setting video_length=1.

Maintenance & Community

  • Active development with a roadmap for future improvements, including higher frame rates, resolutions, and Multi-ControlNet support.
  • Community contributions are encouraged via GitHub issues and PRs.
  • Discord server available for community interaction.

Licensing & Compatibility

  • Licensed under Apache-2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

  • Experimental frame rate and duration variations may yield unstable results; fine-tuning is recommended for improved stability.
  • Generating GIFs outside supported aspect ratios may require fine-tuning Hotshot-XL with custom video data.
  • Currently supports only one ControlNet model at a time.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

Kandinsky-2 by ai-forever

0.0%
3k
Multilingual text-to-image latent diffusion model
created 2 years ago
updated 1 year ago
Feedback? Help us improve.