Ovi  by character-ai

Cross-modal fusion for synchronized audio-video generation

Created 1 month ago
1,130 stars

Top 34.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Ovi is an open-source model for generating synchronized video and audio content from text or text+image inputs. It addresses the challenge of creating cohesive multimodal media by offering a unified approach that simultaneously produces both visual and auditory streams, beneficial for AI media generation researchers and developers.

How It Works

The model utilizes a "Twin Backbone Cross-Modal Fusion" architecture to process and generate video and audio concurrently, ensuring high temporal synchronization. It supports flexible conditioning on text alone or text+images, enabling diverse creative applications and fine-grained control.

Quick Start & Requirements

Installation involves cloning the repo, setting up a Python virtual environment, and installing dependencies via requirements.txt. Prerequisites include PyTorch (v2.5.1) and Flash Attention. Model weights must be downloaded separately. A Gradio app is provided for interaction. Minimum GPU VRAM is 32GB, reducible to 24GB with fp8 quantization and CPU offload, though these may slightly degrade quality and increase runtime.

Highlighted Details

  • Generates 5-second videos at 24 FPS, 720x720 resolution, with various aspect ratios.
  • Flexible input: text-only, text+image, and an 'i2v' mode using an image generation model for initial frames.
  • Advanced prompt formatting uses special tags for speech (<S>, <E>) and audio descriptions (<AUDCAP>, <ENDAUDCAP>).
  • Example prompts and GPT-assisted prompt creation are provided
Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
5
Issues (30d)
35
Star History
897 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jiaming Song Jiaming Song(Chief Scientist at Luma AI).

MoneyPrinterTurbo by harry0703

0.4%
47k
AI tool for one-click short video generation from text prompts
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.