Discover and explore top open-source AI tools and projects—updated daily.
mrgoonieHuman-like multimodal capabilities for AI agents
Top 95.1% on SourcePulse
This project provides a comprehensive Model Context Protocol (MCP) server, Human MCP, designed to equip AI coding agents with human-like multimodal capabilities. It addresses the gap in current AI agents by integrating visual analysis, document processing, content creation, speech generation, browser automation, and advanced reasoning, enabling more sophisticated debugging, understanding, and enhancement of multimodal content. The target audience includes AI developers and users seeking to empower their AI agents with a richer set of human-like functionalities.
How It Works
Human MCP acts as a middleware, exposing 29 distinct tools categorized into four human capabilities: Eyes (visual/document analysis), Hands (content generation/editing/automation), Mouth (speech generation), and Brain (advanced reasoning). It leverages a diverse technology stack, including Google Gemini (for vision, document, speech, image, and video processing), Imagen API, Veo API, ElevenLabs, Minimax, ZhipuAI, Playwright for browser automation, and Jimp for local image manipulation. A key advantage is its multi-provider support, allowing users to select preferred AI models for each capability, offering flexibility and cost optimization.
Quick Start & Requirements
npx @goonnguyen/human-mcp or bun run dev for development.GOOGLE_GEMINI_API_KEY) or within client-specific configuration files. Detailed setup guides are available for various MCP clients like Claude Desktop, Claude Code CLI, and Cursor.Highlighted Details
Maintenance & Community
The project outlines a "Development Roadmap & Vision" and encourages community involvement through "Getting Involved" sections, including issue reporting and discussions. While specific contributors or sponsorships are not detailed, the roadmap indicates ongoing development towards completing the human sensory suite with planned audio processing capabilities.
Licensing & Compatibility
The project is released under the MIT License, which generally permits commercial use and modification. Users should consult the terms of service for any third-party AI provider APIs used.
Limitations & Caveats
The project is actively under development, with audio processing ("Ears") planned for Q1 2025, indicating that not all core human sensory capabilities are yet implemented. Setup requires obtaining and configuring multiple API keys, which may incur costs from AI service providers. While stdio transport is available, HTTP transport with Cloudflare R2 integration is detailed for certain clients, adding a dependency for cloud-based file handling.
2 days ago
Inactive