imagesorcery-mcp  by sunriseapps

Local image processing and recognition for AI assistants

Created 8 months ago
266 stars

Top 96.3% on SourcePulse

GitHubView on GitHub
Project Summary

ImageSorcery MCP provides a suite of local image processing and recognition tools designed to empower AI assistants. It enables AI agents to perform complex image manipulations, object detection, and text extraction directly on a user's machine, ensuring data privacy and eliminating the need for cloud-based services. This makes it ideal for developers building AI-powered applications that require robust, on-device image handling capabilities.

How It Works

ImageSorcery MCP functions as an MCP (Meta Communication Protocol) server, exposing a variety of image processing tools through a standardized interface. It leverages libraries like OpenCV for fundamental operations, Ultralytics for state-of-the-art object detection and segmentation, and EasyOCR for text extraction. Users interact with these tools via natural language prompts interpreted by an AI assistant, which then orchestrates the appropriate ImageSorcery MCP commands. The core advantage lies in its local execution model, processing all images and data on the user's system without external data transmission.

Quick Start & Requirements

  • Primary install: pipx install imagesorcery-mcp (recommended).
  • Prerequisites: Python 3.10 or higher, pipx, and system libraries ffmpeg, libsm6, libxext6, libgl1-mesa-glx (required by OpenCV). An MCP client (e.g., Claude.app, Cline) is necessary for interaction.
  • Setup: The imagesorcery-mcp --post-install command is crucial for downloading models and attempting to install the clip package. Detailed instructions are provided for manual virtual environment setups and potential issues with uv venv.
  • Links: Official website: imagesorcery.net.

Highlighted Details

  • Comprehensive toolset includes: crop, resize, rotate, background removal, drawing (text, shapes, arrows), color manipulation, object detection, OCR, and image overlay.
  • Supports advanced features like object segmentation masks and text field detection.
  • Enables complex, multi-step image tasks through natural language prompts directed at an AI assistant.
  • All operations are performed locally, ensuring user privacy and data security.

Maintenance & Community

The project lists contact points for the author (titulus) and CEO (Vlad Karm) via LinkedIn. Users are encouraged to open issues in the repository for bug reports or feature requests. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

This project is licensed under the MIT License, permitting broad use, modification, and distribution, including for commercial purposes and integration into closed-source applications.

Limitations & Caveats

The installation of the clip Python package, required for text-based image searching, can be complex and may require manual intervention, particularly when using uv venv. Users must have an MCP client configured to communicate with the ImageSorcery MCP server. Certain system libraries may need to be installed separately depending on the operating system or container environment.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
19 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

vision-agent by landing-ai

0.1%
5k
Visual AI agent for generating runnable vision code from image/video prompts
Created 1 year ago
Updated 1 month ago
Feedback? Help us improve.