RealVideo  by zai-org

Real-time streaming conversational video system

Created 1 month ago
274 stars

Top 94.4% on SourcePulse

GitHubView on GitHub
Project Summary

RealVideo is a real-time conversational video system designed to transform text interactions into continuous, high-fidelity video responses. It targets users and developers seeking advanced AI-driven video generation capabilities, offering a benefit of seamless, lip-synced video output directly from text prompts. The system leverages sophisticated AI models for both audio and visual synthesis.

How It Works

This WebSocket-based system processes text input, using GLM-4.5-AirX and GLM-TTS models to generate corresponding AI voice responses. The core innovation lies in its use of autoregressive diffusion (specifically, DiT models) to generate synchronized video frames, enabling real-time lip-syncing with any input image and audio. This modular design facilitates bidirectional communication and continuous video generation.

Quick Start & Requirements

  • Primary Install: pip3 install -r requirements.txt
  • Prerequisites: Python 3.10-3.12, modern browser (WebSocket/Web Audio API support), ZAI API key. Requires downloading the Wan2.2-S2V-14B model.
  • Hardware: Minimum 2x 80GB GPUs (e.g., H100, H200) are mandatory for running the application, with one GPU dedicated to the VAE service and the remainder for DiT parallel computation.
  • Run Command: CUDA_VISIBLE_DEVICES=0,1 bash ./scripts/run_app.sh
  • Access: http://localhost:8003
  • Links: Model download links provided via Hugging Face and ModelScope.

Highlighted Details

  • Model Integration: Features convenient voice cloning and text-to-audio generation capabilities.
  • Modular Design: Employs a clear code structure, enhancing maintainability and extensibility.
  • Real-time Performance: Achieves smooth real-time generation, with DiT block generation times potentially under 500ms (e.g., 306.39ms for 1 block, 4 sp size, 2 denoising steps with compilation).

Maintenance & Community

Specific details regarding maintainers, community channels (like Discord/Slack), or roadmaps were not present in the provided README.

Licensing & Compatibility

The README does not specify the project's license type or provide compatibility notes for commercial use.

Limitations & Caveats

The system imposes significant hardware requirements, mandating at least two high-end 80GB GPUs. Real-time performance is contingent on achieving specific generation speeds for diffusion model blocks. An active ZAI API key is necessary for operation, and the model path requires manual configuration.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
212 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.