Research paper for test-time scaling (TTS) in video generation
Top 93.5% on sourcepulse
Video-T1 addresses the challenge of improving video generation quality and prompt consistency through test-time scaling (TTS). It targets researchers and practitioners in generative AI, offering a method to enhance existing video generation models without retraining.
How It Works
Video-T1 employs a two-pronged search strategy: Random Linear Search and Tree of Frames (ToF) Search. Random Linear Search involves sampling Gaussian noises, generating video clips via step-by-step denoising, and selecting the highest-scoring output based on test verifiers. The ToF Search refines this by dividing the process into stages: image-level alignment for later frames, dynamic prompt guidance focusing on motion stability and physical plausibility, and a final assessment of overall video quality against text prompts. This staged, guided search allows for more efficient exploration of the generation space, leading to higher quality outputs.
Quick Start & Requirements
conda create -n videot1 python==3.10
), activate it (conda activate videot1
), and install dependencies (pip install -r requirements.txt
). Additionally, clone and install LLaVA-NeXT (git clone https://github.com/LLaVA-VL/LLaVA-NeXT && cd LLaVA-NeXT && pip install --no-deps -e ".[train]"
).python -m videot1.py --prompt "..." --video_name ...
. Multi-GPU inference is supported via videot1_multigpu.py
.Highlighted Details
num_inference_steps
, video_branching_factors
, and image_branching_factors
.Maintenance & Community
The project is associated with Tsinghua University. Further community engagement details (Discord/Slack, roadmap) are not explicitly provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is presented as a research contribution (ICCV 2025) and may be in an early stage. Specific hardware requirements for optimal performance and detailed compatibility information are not fully elaborated. The need for multiple large model checkpoints implies a substantial resource footprint.
1 month ago
Inactive