JarvisArt  by LYL1015

Intelligent photo retouching agent

Created 2 months ago
623 stars

Top 53.0% on SourcePulse

GitHubView on GitHub
Project Summary

JarvisArt is an MLLM-driven agent designed to automate and enhance photo retouching tasks. It targets users who want to leverage professional-grade editing capabilities through natural language commands, aiming to democratize artistic photo manipulation by coordinating over 200 Adobe Lightroom tools.

How It Works

JarvisArt employs a novel two-stage training framework. It begins with Chain-of-Thought supervised fine-tuning to establish foundational reasoning skills. This is followed by Group Relative Policy Optimization for Retouching (GRPO-R), a technique designed to improve the agent's decision-making and proficiency in utilizing a wide array of editing tools. This approach allows JarvisArt to mimic professional artist workflows and understand complex retouching instructions.

Quick Start & Requirements

  • Demo: A Hugging Face online demo is available at JarvisArt-Preview.
  • Inference: Inference code and Gradio demo are available. Instructions for batch inference are provided in the Batch Inference documentation.
  • Dependencies: Requires Adobe Lightroom. Specific software dependencies for local inference are detailed in the project's documentation.

Highlighted Details

  • Outperforms GPT-4o by 60% in pixel-level metrics for content fidelity.
  • Supports multi-granularity retouching, from scene-level to region-specific edits.
  • Understands natural language prompts and bounding box inputs for intuitive editing.
  • Leverages over 200 tools within Adobe Lightroom.

Maintenance & Community

The project is actively updated, with recent releases including inference code, Gradio and Hugging Face demos. A WeChat discussion group is available for user support and feedback.

Licensing & Compatibility

The project is released under an unspecified license. The README mentions plans to release the MMArt dataset with an open license, but this is not yet complete. Compatibility with commercial or closed-source applications is not specified.

Limitations & Caveats

The MMArt dataset and full training code are not yet released. The project relies on Adobe Lightroom, which is proprietary software, potentially limiting its standalone usability and integration into non-Lightroom workflows.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
16 stars in the last 30 days

Explore Similar Projects

Starred by Max Howell Max Howell(Author of Homebrew), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

big-sleep by lucidrains

0%
3k
CLI tool for text-to-image generation
Created 4 years ago
Updated 3 years ago
Feedback? Help us improve.