JarvisEvo  by LYL1015

AI agent for synergistic photo editing

Created 3 months ago
338 stars

Top 81.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

JarvisEvo addresses complex photo editing tasks by introducing a self-evolving AI agent. It targets users requiring sophisticated, automated image manipulation, offering a synergistic blend of precise adjustments and creative generation. The primary benefit is an autonomous system that learns and refines its editing strategies for superior, vision-aligned outputs.

How It Works

The system employs interleaved multimodal Chain-of-Thought (iMCoT) reasoning, integrating multi-step planning, dynamic tool orchestration, and iterative visual feedback. This closed-loop approach facilitates self-evaluation and refinement. A key design is the Synergistic Editor-Evaluator Optimization (SEPO) framework, a dual-loop reinforcement learning mechanism where the agent acts as both editor and evaluator, optimizing strategies via intrinsic rewards. It seamlessly integrates professional tools like Adobe Lightroom for precise adjustments and Qwen-Image-Edit for generative tasks, achieving a unique synergy.

Quick Start & Requirements

The repository provides guides for Batch Inference, Training, and Evaluation. An "Agent-to-Lightroom Protocol" is detailed for distributed training. Specific installation commands, hardware requirements (e.g., GPU, CUDA), or estimated setup times are not detailed in the README.

Highlighted Details

  • Interleaved Multimodal Chain-of-Thought (iMCoT): Enables closed-loop reasoning by validating steps against both text and visual feedback, minimizing errors.
  • Synergistic Editor-Evaluator Optimization (SEPO): A self-evolving framework using dual-loop reinforcement learning for autonomous strategy refinement.
  • Unified Preservative & Generative Editing: Integrates over 200 Adobe Lightroom tools for precise adjustments alongside Qwen-Image-Edit for creative synthesis (e.g., object removal, style transfer).
  • Self-Reflective Learning Mechanism: Automatically generates reflection trajectories on suboptimal results to continuously optimize tool selection logic.

Maintenance & Community

A WeChat group is available for user discussions and suggestions. Contact information for inquiries is provided. The project is associated with Tencent Hunyuan and Xiamen University.

Licensing & Compatibility

JarvisEvo is released under the Apache License 2.0. This license is permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The "Open-source Plan" indicates that the SEPO and RFT training code are not yet released. The system's reliance on Adobe Lightroom for precise adjustments may present a significant dependency for users.

Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
67 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.