logparser  by logpai

ML toolkit for automated log parsing

created 10 years ago
1,796 stars

Top 24.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a machine learning toolkit and benchmarks for automated log parsing, enabling users to extract event templates from unstructured logs and structure log analytics. It is targeted at researchers developing new log parsing methods and practitioners evaluating existing ones.

How It Works

The toolkit implements various log parsing algorithms, including Drain, Spell, and Logram, each with different approaches to identifying log message templates. These methods typically involve techniques like fixed-depth trees, n-gram dictionaries, or streaming parsing to cluster similar log messages and extract common templates with parameterized placeholders. This allows for efficient analysis of large volumes of log data.

Quick Start & Requirements

  • Install via pip: pip install logparser3
  • Python 3.6+
  • regex==2022.3.2 (recommended)
  • Additional dependencies for specific parsers (e.g., deap for MoLFI, torch for NuLog, openai for DivLog).
  • Demos and benchmarks are available within the respective parser directories (e.g., logparser/Drain/demo.py).
  • Official documentation and examples are linked within the README.

Highlighted Details

  • Supports 18 different log parsing algorithms, including recent ones like DivLog (ICSE'24).
  • Provides benchmark results for parsers on the LogHub_2k datasets.
  • Offers example code for parsing custom log data.
  • Includes a mechanism for users to submit their own parser implementations.

Maintenance & Community

  • Actively updated with support for Python 3 and new parsers.
  • Community discussion via WeChat group or GitHub issues.
  • Key contributors include researchers from institutions associated with the cited papers.

Licensing & Compatibility

  • The primary license is not explicitly stated, but the README mentions being aware of third-party library licenses.
  • Some parsers have specific dependencies with potentially different licenses.
  • Recommended for research and benchmarking; production use requires careful consideration of implementation details and licenses.

Limitations & Caveats

The project is primarily geared towards research and benchmarking, with the current implementation noted as "far from ready for production use." Suggestions for production readiness include enhancing efficiency, scalability, failure recovery, and persistence, with Drain3 cited as a reference for practical enhancements.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
61 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.