RunbookHermes by Tommy-yw

AI agent for evidence-driven incident response and automated remediation

Created 2 months ago

534 stars

Top 58.5% on SourcePulse

Project Summary

Summary

RunbookHermes addresses complex AIOps incident response by providing an evidence-driven, approval-gated remediation, and runbook learning system. Built as a Hermes-native extension, it empowers engineers and operators to automate and learn from incident resolution, enhancing system reliability.

How It Works

This project adapts the official Hermes Agent runtime into a specialized AIOps incident-response system. It leverages Hermes' core strengths—runtime loop, tool system, memory, and safety boundaries—and extends them with an evidence-centric context engine (EvidenceStack) and domain-specific memory (IncidentMemory). The approach prioritizes reliable evidence collection from observability sources, context compression for AI reasoning, and robust safety gates for remediation actions, culminating in the generation of reusable runbook skills.

Quick Start & Requirements

Primary Install/Run:
- Web/API Only: set PYTHONPATH=. && python -m uvicorn apps.runbook_api.app.main:app --host 127.0.0.1 --port 8000
- Local Reference Environment: cd demo/payment_system && docker compose up --build
Prerequisites: Python, Docker (for local env), environment variables for configuring model providers (OpenAI-compatible), observability backends (Prometheus, Loki, Jaeger), messaging platforms (Feishu, WeCom), and execution adapters.
Links: Web Console (http://127.0.0.1:8000/web/index.html), API Docs (http://127.0.0.1:8000/docs), Roadmap (ROADMAP.md).

Highlighted Details

Hermes-Native AIOps: Extends Hermes Agent with an incident-response domain layer.
Evidence-Driven Diagnosis: Integrates metrics, logs, and traces for root-cause analysis.
Approval-Gated Remediation: Implements safety checks (approval, checkpoint, dry-run, verification) before executing risky actions.
Runbook Learning: Automatically generates reusable runbook skills from incident resolution experience.
Web Console: Offers a dashboard for incident management, monitoring, and operational control.

Maintenance & Community

The project builds upon Hermes Agent by Nous Research. Specific community channels (Discord/Slack) or active maintainer details are not provided in the README. A roadmap is referenced via ROADMAP.md.

Licensing & Compatibility

The repository preserves the upstream Hermes Agent license. RunbookHermes additions follow the same license. Specific compatibility for commercial use or closed-source linking is not detailed beyond the license itself.

Limitations & Caveats

The project requires significant manual integration for production readiness, including replacing local JSON stores with robust databases, connecting real model providers and execution systems, and implementing production deployment manifests. While functional, it necessitates further hardening and configuration for enterprise deployment.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

35 stars in the last 30 days