incidentfox by incidentfox

AI SRE platform for automated incident investigation

Created 5 months ago

641 stars

Top 51.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gabriel Almeida

Cofounder of Langflow

Project Summary

An AI-powered SRE platform, IncidentFox automates incident investigation, root cause analysis, and mitigation suggestions. It targets production on-call engineers, aiming to drastically reduce incident resolution times and alert noise by integrating deeply with observability, infrastructure, and collaboration tools. The platform provides an AI copilot to streamline debugging workflows.

How It Works

IncidentFox employs AI agents that autonomously generate hypotheses, gather data from diverse sources like codebases and Slack history, and reason through to identify root causes. Its core differentiator is its ability to automatically learn and adapt to an organization's specific context, eliminating the need for extensive manual integration setup. It prioritizes a Slack-native user experience, allowing engineers to debug directly within their communication channels.

Quick Start & Requirements

Primary Install/Run: Local Docker setup (≈5 minutes), Self-Host (≈30 minutes), or try instantly in Slack.
Prerequisites: Docker is required for local deployment. Self-hosting implies standard production infrastructure. Support for OpenAI and Claude LLM SDKs suggests API keys may be necessary for full functionality.
Links: Setup Guide (Local Docker), Deployment Guide (Self-Host).

Highlighted Details

Slack-Native UX: Enables debugging and investigation directly within Slack, minimizing context switching.
Contextual AI: Automatically analyzes codebase, Slack history, and past incidents to build a deep understanding of the organization's systems and workflows.
RAPTOR Knowledge Base: Utilizes a hierarchical tree structure (ICLR 2024) for effective context management across long documents, outperforming standard RAG.
Broad Integrations: Offers 300+ available integrations across logs, metrics, cloud infrastructure, and developer tools, with more planned.
Advanced AI Capabilities: Features smart log sampling, a 3-layer alert correlation engine, anomaly detection using Meta's Prophet, and automatic dependency mapping.
Model Flexibility: Supports both OpenAI and Claude LLM SDKs, allowing users to choose their preferred models.
Enterprise-Grade Security: Includes SOC 2 compliance, sandboxed agent execution, secrets proxying, SSO/OIDC integration, and on-premise deployment options.

Maintenance & Community

The project is developed by the IncidentFox team, welcoming contributions via GitHub issues, particularly those labeled "good first issue." Specific community channels like Discord or Slack are not explicitly mentioned in the README.

Licensing & Compatibility

The project is licensed under the Apache License 2.0, which is permissive for commercial use and integration into closed-source projects. It allows users to bring their own LLM keys and deploy the platform anywhere.

Limitations & Caveats

Some advanced features are designated as "Managed (premium features)," indicating they are not part of the open-source offering. The "Coming Soon" list for integrations highlights current gaps in connectivity. While designed for rapid setup, the effectiveness of the auto-learning context feature in highly complex or bespoke environments may require evaluation. Specific hardware or OS requirements for self-hosting are not detailed beyond general production readiness.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

24 stars in the last 30 days