OpenRCA  by microsoft

LLM-driven root cause analysis for software failures

Created 1 year ago
275 stars

Top 94.3% on SourcePulse

GitHubView on GitHub
Project Summary

OpenRCA provides a benchmark for evaluating Large Language Models (LLMs) in performing root cause analysis (RCA) for software failures. It targets researchers and engineers seeking to automate failure diagnosis by enabling LLMs to analyze diverse telemetry data, including KPI time series, dependency graphs, and logs, thereby improving system reliability and observability.

How It Works

The project leverages LLMs to analyze complex system dependencies across various telemetry types. A key component, the RCA-agent, utilizes Python for data retrieval and analysis. This approach circumvents the limitations of processing excessively long contexts, allowing LLMs to focus on reasoning and enabling scalable analysis of extensive telemetry data.

Quick Start & Requirements

  • Installation: Requires Python >= 3.10. Installation involves cloning the repository, optionally creating a conda environment (conda create -n openrca python=3.10), activating it (conda activate openrca), and installing dependencies (pip install -r requirements.txt).
  • Prerequisites: A minimum of 80GB storage and 32GB RAM is recommended due to the large dataset size and memory-intensive operations. Telemetry data must be downloaded separately from Google Drive and placed in the dataset/ directory, following a specified structure.
  • Links: Repository: https://github.com/microsoft/OpenRCA.git.

Highlighted Details

  • Benchmark for assessing LLM root cause analysis capabilities in software operating scenarios.
  • Analyzes KPI time series, dependency trace graphs, and semi-structured log text.
  • RCA-agent employs Python for data handling to manage context length and enhance scalability.
  • Includes reproduction scripts for paper results, requiring API configuration (e.g., OpenAI).

Maintenance & Community

The project is authored by Xu, Junjielong et al., associated with the ICLR'25 paper. No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The specific open-source license is not explicitly stated. The project is owned by Microsoft, and its "Disclaimer" section includes clauses regarding user indemnification and compliance with third-party model licenses, suggesting potential restrictions for commercial use or integration into closed-source systems.

Limitations & Caveats

Significant hardware resources (80GB storage, 32GB RAM) are required. The telemetry dataset download is a manual, multi-step process. The absence of a clear license is a notable adoption blocker. The project's research-oriented nature and strong indemnification clauses in the disclaimer warrant careful review before adoption.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
1
Star History
30 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.