Discover and explore top open-source AI tools and projects—updated daily.
bytedanceAdvanced models for visual quality assessment and reasoning
Top 99.3% on SourcePulse
Summary The Q-Insight family addresses image and video quality assessment (IQA/VQA) for AI-generated content. It targets researchers and engineers, offering superior performance, out-of-domain generalization, and detailed reasoning for tasks like score regression, degradation perception, and comparison reasoning across natural and synthetic media.
How It Works Q-Insight uses Visual Reinforcement Learning for IQA, achieving strong generalization by converting visual to text representations. VQ-Insight employs a reasoning-style Vision-Language Model (VLM) for AI-generated video quality, enabling nuanced preference comparison and scoring with explicit reasoning. RALI, a lightweight CLIP-based scorer, validates that RL training drives generalization in MLLM-based IQA, offering comparable accuracy with significantly reduced parameters and inference time.
Quick Start & Requirements
Clone the repo (git clone https://github.com/bytedance/Q-Insight.git) and run bash setup.sh. VQ-Insight requires cd src/eval/qwen-vl-utils && pip install -e .[decord]. Demos are provided for various IQA/VQA tasks. RALI requires manual download and placement of pretrained weights into Q-Insight/checkpoints/. Dataset preparation instructions are detailed.
Highlighted Details
Maintenance & Community Recent releases include VQ-Insight and RALI code/models (Feb 2026). Key papers accepted to NeurIPS 2025, AAAI 2026, ICLR 2026. No specific community channels or detailed roadmap beyond planned features are provided.
Licensing & Compatibility The README does not specify a software license, potentially impacting commercial use or closed-source integration.
Limitations & Caveats Planned features include LoRA fine-tuning support and a Gradio demo, which are not yet implemented.
2 weeks ago
Inactive