doris-mcp-server  by apache

MCP server for Apache Doris database interaction and LLM integration

Created 9 months ago
257 stars

Top 98.3% on SourcePulse

GitHubView on GitHub
Project Summary

Apache Doris MCP Server provides a Python-based backend service using FastAPI to enable sophisticated interactions with Apache Doris databases. It implements the Model Context Protocol (MCP), allowing clients to leverage Large Language Models (LLMs) for natural language to SQL (NL2SQL) conversion, query execution, metadata management, and data analysis. This offers a scalable, secure, and efficient gateway for data professionals and researchers to interact with Doris data.

How It Works

The server acts as an intermediary, translating client requests via the MCP into actions against Apache Doris. It employs a modular architecture with dedicated managers for tools, resources, and prompts. Communication is supported via streamable HTTP for robust web services and stdio for direct integration with clients like Cursor. Key architectural choices include a stateless multi-worker design for horizontal scalability and advanced connection pooling with session caching to reduce overhead. The system prioritizes security through comprehensive validation, role-based access control, and SQL injection protection.

Quick Start & Requirements

  • Installation: pip install doris-mcp-server
  • Prerequisites: Python 3.12+, Apache Doris database connection details (host, port, user, password, database).
  • Running:
    • HTTP Mode: doris-mcp-server --transport http --host 0.0.0.0 --port 3000 --db-host <doris_host> --db-port <doris_port> --db-user <doris_user> --db-password <doris_password>
    • Stdio Mode: doris-mcp-server --transport stdio
  • Configuration: Environment variables (e.g., DORIS_HOST) or command-line arguments can be used.
  • Token Management Interface: Accessible at http://localhost:3000/token/management (requires specific environment variables and admin token).

Highlighted Details

  • v0.6.0 Enterprise Authentication: Introduces token-bound database configurations, JWT, and OAuth support with granular control switches and enterprise-grade security defaults.
  • Zero-Downtime Operations: Features immediate database validation at connection time and hot-reloading of configurations (e.g., tokens.json) without service interruption.
  • Scalability & Performance: Employs a stateless multi-worker architecture for true horizontal scaling and advanced connection pooling that reduces overhead by up to 60%.
  • ADBC Integration: Leverages Apache Arrow Flight SQL (ADBC) for 3-10x performance improvements when handling large datasets.
  • Data Governance & Analytics: Includes 7 enterprise-grade tools for data quality analysis, lineage tracking, performance monitoring, and more.
  • Catalog Federation: Seamlessly interacts with multiple data catalogs, including internal Doris tables and external sources like Hive and MySQL.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README.

Licensing & Compatibility

  • License: Apache 2.0 License.
  • Compatibility: Designed for Apache Doris. The permissive Apache 2.0 license generally allows for commercial use and integration with closed-source systems.

Limitations & Caveats

Small parameter LLMs may require specific prompt engineering to effectively utilize MCP tools, as detailed in the FAQ. The web-based token management interface is restricted to localhost access for security reasons.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
8 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.