Q-Align by Q-Future

Visual scoring foundation model for image/video quality and aesthetics assessment

Created 2 years ago

550 stars

Top 58.1% on SourcePulse

Project Summary

Q-Align is an all-in-one foundation model for visual scoring tasks, including Image Quality Assessment (IQA), Image Aesthetic Assessment (IAA), and Video Quality Assessment (VQA). It is designed for researchers and developers working with multimodal large language models (LMMs) who need a unified approach to evaluating visual content. The model offers efficient fine-tuning capabilities for downstream datasets and aims to simplify the process of visual scoring.

How It Works

Q-Align leverages a LLaVA-style architecture, integrating visual understanding with language models. It employs discrete, text-defined levels to teach LMMs how to perform visual scoring. This approach allows for a unified model that can handle diverse scoring tasks by mapping visual inputs to predefined textual categories or scores, enabling efficient fine-tuning and adaptation to specific datasets.

Quick Start & Requirements

Installation: Can be used directly via Hugging Face AutoModel or by installing the repository (pip install -e .). For training, additional dependencies are required (pip install -e ".[train]" and flash_attn).
Prerequisites: Python, PyTorch. For training, NVIDIA GPUs (e.g., 2x RTX3090 for LoRA, 4x A100/8x A6000 for full training) and CUDA are recommended.
Resources: Inference is lightweight. Training requires significant GPU resources.
Links: Model Zoo, Homepage, Technical Report, OneScorer HF Space.

Highlighted Details

Unified model for IQA, IAA, and VQA.
ICML 2024 accepted paper.
Supports LoRA fine-tuning with 2x RTX3090.
Updated to be compatible with transformers==4.36.1.

Maintenance & Community

The project is associated with Nanyang Technological University and Shanghai Jiao Tong University. Contact information for authors is provided for queries.

Licensing & Compatibility

The repository appears to be released under a permissive license, but specific details are not explicitly stated in the README. Compatibility with commercial or closed-source projects should be verified.

Limitations & Caveats

The README notes that the v1.1 update is incompatible with older versions (v1.0.1/v1.0.0 and before). Training from scratch requires substantial GPU resources. The specific license for commercial use is not clearly defined.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days