Q-Insight by bytedance

Advanced models for visual quality assessment and reasoning

Created 1 year ago

309 stars

Top 86.8% on SourcePulse

Project Summary

Summary The Q-Insight family addresses image and video quality assessment (IQA/VQA) for AI-generated content. It targets researchers and engineers, offering superior performance, out-of-domain generalization, and detailed reasoning for tasks like score regression, degradation perception, and comparison reasoning across natural and synthetic media.

How It Works Q-Insight uses Visual Reinforcement Learning for IQA, achieving strong generalization by converting visual to text representations. VQ-Insight employs a reasoning-style Vision-Language Model (VLM) for AI-generated video quality, enabling nuanced preference comparison and scoring with explicit reasoning. RALI, a lightweight CLIP-based scorer, validates that RL training drives generalization in MLLM-based IQA, offering comparable accuracy with significantly reduced parameters and inference time.

Quick Start & Requirements Clone the repo (git clone https://github.com/bytedance/Q-Insight.git) and run bash setup.sh. VQ-Insight requires cd src/eval/qwen-vl-utils && pip install -e .[decord]. Demos are provided for various IQA/VQA tasks. RALI requires manual download and placement of pretrained weights into Q-Insight/checkpoints/. Dataset preparation instructions are detailed.

Highlighted Details

Q-Insight: NeurIPS 2025 spotlight (Top 3%).
VQ-Insight: AAAI 2026 oral presentation.
RALI: ICLR 2026 oral presentation.
Q-Insight shows superior out-of-domain performance compared to existing IQA metrics.
RALI achieves comparable accuracy to Q-Insight using ~4% of its parameters and inference time.

Maintenance & Community Recent releases include VQ-Insight and RALI code/models (Feb 2026). Key papers accepted to NeurIPS 2025, AAAI 2026, ICLR 2026. No specific community channels or detailed roadmap beyond planned features are provided.

Q-Insight by bytedance

Explore Similar Projects

Video-MME-v2 by MME-Benchmarks

Thinking-with-Video by tongjingqi

Awesome-Video-LMM-Post-Training by yunlong10

VADER by mihirp1998

Awesome-Text-to-Video-Generation by soraw-ai

ViFi-CLIP by muzairkhattak

Q-Align by Q-Future

physics-IQ-benchmark by google-deepmind

Video-R1 by tulerfeng

VBench by Vchitect

Eagle by NVlabs

Step-Video-T2V by stepfun-ai