codefuse-devops-eval  by codefuse-ai

DevOps-Eval: benchmark for LLMs in the DevOps/AIOps domain

created 1 year ago
640 stars

Top 52.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides DevOps-Eval, a comprehensive benchmark suite for evaluating Large Language Models (LLMs) in the DevOps and AIOps domains. It offers a structured way for developers to track model progress and identify strengths and weaknesses, featuring a large dataset of multiple-choice questions and practical scenarios.

How It Works

DevOps-Eval comprises three main categories: DevOps (general), AIOps (log parsing, time series analysis, root cause analysis), and ToolLearning (function calling across various tools). The benchmark includes both zero-shot and few-shot evaluation settings, with specific data splits for development (few-shot examples) and testing. The evaluation framework allows users to integrate and test their own Hugging Face-formatted models by defining custom loader and context builder functions.

Quick Start & Requirements

  • Data Download: Download devopseval-exam.zip or load via Hugging Face datasets library (load_dataset("DevOps-Eval/devopseval-exam")).
  • Evaluation: Run python src/run_eval.py with specified model paths, configurations, and dataset details.
  • Prerequisites: Python, Hugging Face datasets, pandas. Model-specific dependencies will vary.

Highlighted Details

  • Contains 7486 multiple-choice questions across 8 general DevOps categories.
  • Includes 2840 AIOps samples covering log parsing, time series anomaly detection, classification, forecasting, and root cause analysis.
  • Features 1509 ToolLearning samples across 59 fields and 239 tool scenes, compatible with OpenAI's Function Calling format.
  • Provides a public leaderboard for comparing model performance.

Maintenance & Community

The project is actively updated, with recent additions including ToolLearning samples and AIOps leaderboards. Links to Hugging Face and Chinese/English tutorials are provided.

Licensing & Compatibility

Licensed under the Apache License (Version 2.0). This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The project is still under development, with planned additions including more samples, harder difficulty levels, and an English version of the samples. The "Coming Soon" note for citation suggests the primary research paper is not yet published.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.