Q-Align  by Q-Future

Visual scoring foundation model for image/video quality and aesthetics assessment

created 1 year ago
471 stars

Top 65.6% on sourcepulse

GitHubView on GitHub
Project Summary

Q-Align is an all-in-one foundation model for visual scoring tasks, including Image Quality Assessment (IQA), Image Aesthetic Assessment (IAA), and Video Quality Assessment (VQA). It is designed for researchers and developers working with multimodal large language models (LMMs) who need a unified approach to evaluating visual content. The model offers efficient fine-tuning capabilities for downstream datasets and aims to simplify the process of visual scoring.

How It Works

Q-Align leverages a LLaVA-style architecture, integrating visual understanding with language models. It employs discrete, text-defined levels to teach LMMs how to perform visual scoring. This approach allows for a unified model that can handle diverse scoring tasks by mapping visual inputs to predefined textual categories or scores, enabling efficient fine-tuning and adaptation to specific datasets.

Quick Start & Requirements

  • Installation: Can be used directly via Hugging Face AutoModel or by installing the repository (pip install -e .). For training, additional dependencies are required (pip install -e ".[train]" and flash_attn).
  • Prerequisites: Python, PyTorch. For training, NVIDIA GPUs (e.g., 2x RTX3090 for LoRA, 4x A100/8x A6000 for full training) and CUDA are recommended.
  • Resources: Inference is lightweight. Training requires significant GPU resources.
  • Links: Model Zoo, Homepage, Technical Report, OneScorer HF Space.

Highlighted Details

  • Unified model for IQA, IAA, and VQA.
  • ICML 2024 accepted paper.
  • Supports LoRA fine-tuning with 2x RTX3090.
  • Updated to be compatible with transformers==4.36.1.

Maintenance & Community

The project is associated with Nanyang Technological University and Shanghai Jiao Tong University. Contact information for authors is provided for queries.

Licensing & Compatibility

The repository appears to be released under a permissive license, but specific details are not explicitly stated in the README. Compatibility with commercial or closed-source projects should be verified.

Limitations & Caveats

The README notes that the v1.1 update is incompatible with older versions (v1.0.1/v1.0.0 and before). Training from scratch requires substantial GPU resources. The specific license for commercial use is not clearly defined.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
50 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.