DCVC by microsoft

Neural video codec for real-time compression

Created 3 years ago

672 stars

Top 50.4% on SourcePulse

Project Summary

DCVC-RT is a neural video codec designed for practical, real-time applications, targeting researchers and developers seeking high compression efficiency and low latency. It offers a single model for a wide bitrate range, controllable rate control, and unified YUV/RGB coding, aiming to surpass traditional codecs like H.266/VTM in both speed and compression ratio.

How It Works

DCVC-RT minimizes operational costs, identified as a key bottleneck for NVC speed, by employing implicit temporal modeling to avoid complex motion modules and using single low-resolution latent representations. This approach accelerates encoding/decoding without sacrificing compression quality. It also incorporates model integerization for cross-device consistency and a module-bank-based rate control for adaptability.

Quick Start & Requirements

Install: Requires Python 3.12, CUDA 12.6, and PyTorch 2.6. Install dependencies via conda and pip install -r requirements.txt. C++ components for bitstream writing and CUDA kernels require cmake, g++, ninja-build, and pip install . in respective src/cpp/ and src/layers/extensions/inference/ directories.
Prerequisites: NVIDIA GPU, CUDA 12.6, Python 3.12.
Pretrained Models: Download from provided links and place in ./checkpoints.
Testing: Use test_video.py with specified model paths, rate numbers, and dataset configurations.
Resources: CPU frequency scaling is recommended for arithmetic coding performance.
Docs: Usage, Test Conditions

Highlighted Details

Achieves 100+ FPS for 1080p and real-time 4K coding.
Offers an average 21% bitrate saving over H.266/VTM for 1080p content.
Intra-frame codec shows 11.1% bitrate reduction over VTM with over 10x faster decoding.
Supports wide bitrate range and dynamic rate control via quantization parameters.

Maintenance & Community

The project is associated with Microsoft and builds upon the DCVC family of models, with contributions from various researchers. Further details on the DCVC family can be found in the DCVC-family section.

Licensing & Compatibility

The project is released under a license that permits use and modification, with specific trademark guidelines for Microsoft assets. Compatibility for commercial use or closed-source linking is not explicitly detailed but implied by the permissive nature of typical research codebases.

Limitations & Caveats

While optimized for CUDA 12.6 and PyTorch 2.6, compatibility with other versions may vary. The README notes that time.time() precision can be insufficient on Windows for accurate speed measurements. CPU performance for arithmetic coding is critical and requires manual configuration.

DCVC by microsoft

Explore Similar Projects

taehv by madebyollin

VideoChat-Flash by OpenGVLab

LLaVA-UHD by thunlp

Awesome-Deep-Learning-Based-Video-Compression by ppingzhang

Video-XL by VectorSpaceLab

L3C-PyTorch by fab-jul

Allegro by rhymes-ai

CogVLM2 by zai-org

FastVideo by hao-ai-lab

Step-Video-T2V by stepfun-ai

ml-fastvlm by apple

HunyuanVideo by Tencent-Hunyuan