Robust-R1 by jqtangust

Degradation-aware reasoning for robust visual understanding

Created 3 months ago

528 stars

Top 59.8% on SourcePulse

Project Summary

Robust-R1 addresses the critical challenge of maintaining visual understanding capabilities in the presence of image degradation. It offers a degradation-aware reasoning framework designed for researchers and engineers working with multimodal AI systems, aiming to improve model robustness and interpretability under noisy or corrupted visual inputs. The project provides a comprehensive solution including code, pre-trained models, and datasets.

How It Works

Robust-R1 builds upon the Qwen2.5-VL-Base model, employing a two-stage fine-tuning process. Initially, it undergoes supervised fine-tuning (SFT) using the LLaMA-Factory framework. Subsequently, it is further refined through reinforcement learning (RL) to enhance its reasoning capabilities under degraded conditions. The core innovation lies in its degradation-aware reasoning mechanism, which explicitly accounts for the impact of image corruptions on semantic understanding, moving beyond isolated optimization of visual encoders and language models.

Quick Start & Requirements

Installation involves cloning the repository, creating a Conda environment with Python 3.10, and running a setup.sh script. Key dependencies include LLaMA-Factory and VLMEvalKit. Pre-trained checkpoints (Qwen2.5-VL-Base, Robust-R1-SFT, Robust-R1-RL) are available on HuggingFace. A CLI demo and a local GUI demo (accessible at http://localhost:7860) are provided, alongside an online demo on HF Space. Links to the paper, models, and dataset are available on HuggingFace.

Highlighted Details

Featured as an AAAI 2026 Oral paper, indicating significant research contribution.
Focuses on "Degradation-Aware Reasoning" for robust visual understanding.
Provides an "Image Degradation Pipeline" for generating corrupted images to systematically evaluate model robustness.
Evaluated using VLMEvalKit and R-Bench to assess performance against various degradation types and real-world corruptions.

Maintenance & Community

The project is supported by a grant from the Research Grants Council of the Hong Kong Special Administrative Region. It acknowledges contributions from authors of VLM-R1, LLaMA-Factory, and R-Bench. No specific community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The project is released under the MIT License, which generally permits broad use, including commercial applications, with minimal restrictions. No specific compatibility notes for closed-source linking or other integration challenges are detailed.

Limitations & Caveats

The README does not explicitly list known limitations, bugs, or unsupported platforms. As a research project accepted for AAAI 2026, it may not yet be optimized for production deployment. The setup process requires integrating multiple external repositories and managing specific dependencies, which could present an adoption hurdle.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

55 stars in the last 30 days