Discover and explore top open-source AI tools and projects—updated daily.
tangpan360AI agent for microservice root cause analysis using multi-modal data
Top 99.8% on SourcePulse
This project addresses the complex challenge of microservice root cause analysis by leveraging a multi-modal data approach powered by Large Language Model (LLM) agents. It is designed for engineers and researchers working with microservice architectures who need to quickly identify and diagnose faults. The solution offers a structured, reasoning-trace-backed output, aiming to provide a complete closed loop from fault observation to root cause identification, as demonstrated by its Top 5 ranking in the 2025 CCF International AIOps Challenge.
How It Works
The system employs a modular architecture comprising five core components: data preprocessing, log fault extraction, trace fault detection, metric fault summarization, and multi-modal root cause analysis. Data interaction between these loosely coupled modules is managed through function encapsulation. Key techniques include the Drain3 algorithm for efficient log templating and data volume reduction, IsolationForest for detecting anomalies in trace durations, and LLM-based summarization for analyzing both application performance monitoring (APM) and infrastructure metrics. This multi-modal fusion, combined with LLM agents for reasoning, enables a comprehensive analysis across logs, traces, and metrics.
Quick Start & Requirements
src/requirements.txt and should be installed, preferably within a Conda environment (Python 3.10 recommended).KEJIYUN_API_KEY, KEJIYUN_API_BASE) configured in src/.env.src/agent/llm_config.py (default: deepseek-chat).bash run.sh after completing setup and configuration.https://challenge.aiops.cn/home/competition/1963605668447416345Highlighted Details
reasoning_trace.Maintenance & Community
No specific community channels (e.g., Discord, Slack) or detailed maintenance information (e.g., recent commit activity, roadmap) are provided in the README. The authors are listed as Tang, Pan; Tang, Shixiang; Pu, Huanqi; Miao, Zhiqing; and Wang, Zhixing.
Licensing & Compatibility
The license type and any associated compatibility notes for commercial use or closed-source linking are not explicitly stated in the provided README content.
Limitations & Caveats
The project acknowledges that optimal root cause localization accuracy requires deeper integration of domain-specific Operation and Maintenance (O&M) knowledge, such as enterprise experience, key business indicators, and standardized fault diagnosis SOPs. Without access to such O&M resources, the current solution's accuracy may be limited. The system is also dependent on external LLM API accessibility.
5 months ago
Inactive