Multimodal AI for animated short stories from text prompts
Top 60.9% on sourcepulse
This project provides a multimodal AI storyteller that generates short animated videos from a text prompt. It's designed for users interested in AI-driven content creation, enabling them to produce visual narratives with accompanying audio.
How It Works
The system orchestrates a pipeline of AI models. A GPT model expands a given prompt into a story, generating text sentence by sentence. For each sentence, Stable Diffusion creates a corresponding image. Finally, a neural text-to-speech (TTS) model narrates the story, and these components are combined into a video. This approach automates the entire creative process from text to a complete audiovisual experience.
Quick Start & Requirements
$ pip install storyteller-core
$ git clone https://github.com/jaketae/storyteller.git && cd storyteller && pip install .
mecab
via Homebrew (brew install mecab
).$ storyteller --help
.Highlighted Details
Maintenance & Community
The project is maintained by jaketae. Further community engagement channels are not explicitly listed in the README.
Licensing & Compatibility
Limitations & Caveats
PyTorch support for Apple Silicon's MPS is noted as a work in progress, with potential issues on specific PyTorch versions. The default models used (e.g., gpt2
, stabilityai/stable-diffusion-2
) may require significant VRAM when running on GPU.
1 year ago
1 day