RealVideo by zai-org

Real-time streaming conversational video system

Created 2 months ago

302 stars

Top 88.6% on SourcePulse

Project Summary

RealVideo is a real-time conversational video system designed to transform text interactions into continuous, high-fidelity video responses. It targets users and developers seeking advanced AI-driven video generation capabilities, offering a benefit of seamless, lip-synced video output directly from text prompts. The system leverages sophisticated AI models for both audio and visual synthesis.

How It Works

This WebSocket-based system processes text input, using GLM-4.5-AirX and GLM-TTS models to generate corresponding AI voice responses. The core innovation lies in its use of autoregressive diffusion (specifically, DiT models) to generate synchronized video frames, enabling real-time lip-syncing with any input image and audio. This modular design facilitates bidirectional communication and continuous video generation.

Quick Start & Requirements

Primary Install: pip3 install -r requirements.txt
Prerequisites: Python 3.10-3.12, modern browser (WebSocket/Web Audio API support), ZAI API key. Requires downloading the Wan2.2-S2V-14B model.
Hardware: Minimum 2x 80GB GPUs (e.g., H100, H200) are mandatory for running the application, with one GPU dedicated to the VAE service and the remainder for DiT parallel computation.
Run Command: CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/run_app.sh
Access: http://localhost:8003
Links: Model download links provided via Hugging Face and ModelScope.

Highlighted Details

Model Integration: Features convenient voice cloning and text-to-audio generation capabilities.
Modular Design: Employs a clear code structure, enhancing maintainability and extensibility.
Real-time Performance: Achieves smooth real-time generation, with DiT block generation times potentially under 500ms (e.g., 306.39ms for 1 block, 4 sp size, 2 denoising steps with compilation).

Maintenance & Community

Specific details regarding maintainers, community channels (like Discord/Slack), or roadmaps were not present in the provided README.

Licensing & Compatibility

The README does not specify the project's license type or provide compatibility notes for commercial use.

Limitations & Caveats

The system imposes significant hardware requirements, mandating at least two high-end 80GB GPUs. Real-time performance is contingent on achieving specific generation speeds for diffusion model blocks. An active ZAI API key is necessary for operation, and the model path requires manual configuration.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days