JarvisArt  by LYL1015

Intelligent photo retouching agent

Created 11 months ago
822 stars

Top 42.7% on SourcePulse

GitHubView on GitHub
Project Summary

JarvisArt is an MLLM-driven agent designed to automate and enhance photo retouching tasks. It targets users who want to leverage professional-grade editing capabilities through natural language commands, aiming to democratize artistic photo manipulation by coordinating over 200 Adobe Lightroom tools.

How It Works

JarvisArt employs a novel two-stage training framework. It begins with Chain-of-Thought supervised fine-tuning to establish foundational reasoning skills. This is followed by Group Relative Policy Optimization for Retouching (GRPO-R), a technique designed to improve the agent's decision-making and proficiency in utilizing a wide array of editing tools. This approach allows JarvisArt to mimic professional artist workflows and understand complex retouching instructions.

Quick Start & Requirements

  • Demo: A Hugging Face online demo is available at JarvisArt-Preview.
  • Inference: Inference code and Gradio demo are available. Instructions for batch inference are provided in the Batch Inference documentation.
  • Dependencies: Requires Adobe Lightroom. Specific software dependencies for local inference are detailed in the project's documentation.

Highlighted Details

  • Outperforms GPT-4o by 60% in pixel-level metrics for content fidelity.
  • Supports multi-granularity retouching, from scene-level to region-specific edits.
  • Understands natural language prompts and bounding box inputs for intuitive editing.
  • Leverages over 200 tools within Adobe Lightroom.

Maintenance & Community

The project is actively updated, with recent releases including inference code, Gradio and Hugging Face demos. A WeChat discussion group is available for user support and feedback.

Licensing & Compatibility

The project is released under an unspecified license. The README mentions plans to release the MMArt dataset with an open license, but this is not yet complete. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The MMArt dataset and full training code are not yet released. The project relies on Adobe Lightroom, which is proprietary software, potentially limiting its standalone usability and integration into non-Lightroom workflows.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
116 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.