Discover and explore top open-source AI tools and projects—updated daily.
EzioByScaling instruction-based video editing with synthetic data
New!
Top 68.8% on SourcePulse
Ditto addresses the critical data scarcity challenge in instruction-based video editing by introducing a scalable pipeline for generating high-quality synthetic data. This framework enables the training of state-of-the-art models like Editto, offering researchers and practitioners a robust solution for advanced video manipulation. The primary benefit is enabling high-fidelity, instruction-driven video edits at scale, overcoming limitations of existing datasets and models.
How It Works
Ditto employs a novel data generation pipeline that synergizes the creative diversity of image editing tools with an in-context video generator. To manage the cost-quality trade-off, it utilizes an efficient, distilled model architecture enhanced by a temporal enhancer for improved coherence and reduced computational load. An intelligent agent drives the process, generating diverse instructions and ensuring rigorous quality control for scalable data production. The resulting Ditto-1M dataset comprises one million high-fidelity video editing examples.
Quick Start & Requirements
Installation involves creating a Conda environment (python=3.10), activating it, and running pip install -e .. Users must download base models (e.g., Wan-AI/Wan2.1-VACE-14B) and Ditto-specific models from Hugging Face or Google Drive. Inference can be performed via the infer.sh script or python inference/infer_ditto.py, requiring input/output video paths, a prompt, LoRA path, and device ID. ComfyUI integration is also supported, requiring its setup and specific custom nodes. Links to the paper, project page, model weights, and dataset are provided.
Highlighted Details
Maintenance & Community
The project is associated with academic researchers and leverages foundational models like Wan, VACE, and QwenVL. The codebase is based on DiffSynth-Studio. No specific community channels (e.g., Discord, Slack) or explicit roadmap details are provided in the README.
Licensing & Compatibility
The project is licensed under CC BY-NC-SA 4.0 (Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License). This license restricts usage to academic research purposes and prohibits commercial use.
Limitations & Caveats
The code is explicitly provided for academic research purposes only, with a strict non-commercial use clause. Integration via ComfyUI may result in some quality degradation due to the use of quantized and distilled models. As the associated paper is a preprint (2025), the project may represent ongoing research.
6 days ago
Inactive
ModelTC
hao-ai-lab
Lightricks