Q-Insight  by bytedance

Advanced models for visual quality assessment and reasoning

Created 9 months ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary The Q-Insight family addresses image and video quality assessment (IQA/VQA) for AI-generated content. It targets researchers and engineers, offering superior performance, out-of-domain generalization, and detailed reasoning for tasks like score regression, degradation perception, and comparison reasoning across natural and synthetic media.

How It Works Q-Insight uses Visual Reinforcement Learning for IQA, achieving strong generalization by converting visual to text representations. VQ-Insight employs a reasoning-style Vision-Language Model (VLM) for AI-generated video quality, enabling nuanced preference comparison and scoring with explicit reasoning. RALI, a lightweight CLIP-based scorer, validates that RL training drives generalization in MLLM-based IQA, offering comparable accuracy with significantly reduced parameters and inference time.

Quick Start & Requirements Clone the repo (git clone https://github.com/bytedance/Q-Insight.git) and run bash setup.sh. VQ-Insight requires cd src/eval/qwen-vl-utils && pip install -e .[decord]. Demos are provided for various IQA/VQA tasks. RALI requires manual download and placement of pretrained weights into Q-Insight/checkpoints/. Dataset preparation instructions are detailed.

Highlighted Details

  • Q-Insight: NeurIPS 2025 spotlight (Top 3%).
  • VQ-Insight: AAAI 2026 oral presentation.
  • RALI: ICLR 2026 oral presentation.
  • Q-Insight shows superior out-of-domain performance compared to existing IQA metrics.
  • RALI achieves comparable accuracy to Q-Insight using ~4% of its parameters and inference time.

Maintenance & Community Recent releases include VQ-Insight and RALI code/models (Feb 2026). Key papers accepted to NeurIPS 2025, AAAI 2026, ICLR 2026. No specific community channels or detailed roadmap beyond planned features are provided.

Licensing & Compatibility The README does not specify a software license, potentially impacting commercial use or closed-source integration.

Limitations & Caveats Planned features include LoRA fine-tuning support and a Gradio demo, which are not yet implemented.

Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
19 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.