ongrid  by ongridio

AI Ops Agent for infrastructure understanding, root-cause analysis, and automated fixes

Created 1 month ago
295 stars

Top 89.5% on SourcePulse

GitHubView on GitHub
Project Summary

An AI-powered operations agent, Ongrid automates infrastructure root cause analysis and remediation. It targets SREs and operations teams, offering automated investigation, correlation across metrics, logs, and traces, and secure remote execution directly through chat platforms like Slack and Telegram. The primary benefit is reducing Mean Time To Resolution (MTTR) by providing rapid, AI-driven insights and actions.

How It Works

Ongrid employs a coordinator-specialist agent architecture, dispatching tasks to specialized agents (SRE, network, DB). It automatically investigates alerts by correlating metrics, logs, and traces against infrastructure topology, aiming to pinpoint root causes, even linking them to specific source-code lines via RAG knowledge and code search. A key design choice is its "zero inbound ports" security model, where the agent initiates outbound connections, enhancing security by avoiding open ports on hosts.

Quick Start & Requirements

Installation involves downloading a pre-compiled release (linux-amd64 or linux-arm64), extracting it, and running sudo ./install.sh. Supported operating systems include Ubuntu 22.04+, Debian 12+, and RHEL/Rocky 9. A full demo video is available.

Highlighted Details

  • Automated Root Cause Analysis (RCA) correlating metrics, logs, traces, and topology.
  • Secure remote execution via "Browser SSH" using reverse tunnels, with audited command execution.
  • Integrated observability stack (Prometheus, Loki, Tempo, Grafana) with agent-driven query generation.
  • Support for multiple LLM providers (OpenAI, Anthropic, Gemini, etc.) with hot-routing capabilities.
  • "Zero inbound ports" security model for agent-host communication.

Maintenance & Community

No specific details on maintainers, community channels (like Discord/Slack), or roadmap were found in the provided text.

Licensing & Compatibility

The project is licensed under the Apache 2.0 license. This license is generally permissive and compatible with commercial use and closed-source projects.

Limitations & Caveats

The current installation instructions and binary releases are specific to Linux (Ubuntu, Debian, RHEL/Rocky). The "zero inbound ports" and "Browser SSH" features, while innovative, require careful security review in production environments. The project appears to be actively developed, with a recent release version (v0.8.6) noted.

Health Check
Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
108
Issues (30d)
18
Star History
296 stars in the last 30 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

cai by aliasrobotics

0.6%
9k
Cybersecurity AI (CAI) is an open framework for building AI-driven cybersecurity tools
Created 1 year ago
Updated 1 week ago
Feedback? Help us improve.