langkit  by whylabs

Open-source toolkit for monitoring LLMs

created 2 years ago
930 stars

Top 40.1% on sourcepulse

GitHubView on GitHub
Project Summary

LangKit is an open-source toolkit designed for monitoring Large Language Models (LLMs) by extracting key signals from prompts and responses. It targets ML engineers and researchers working with LLMs in production, providing observability into text quality, relevance, security, and sentiment to mitigate risks associated with unpredictable model behavior.

How It Works

LangKit integrates seamlessly with the whylogs data logging library, offering User-Defined Functions (UDFs) that automatically enhance text feature logging. It employs a modular approach, allowing users to select specific metric categories like text quality (readability, complexity), relevance (similarity to themes), security (jailbreaks, prompt injection, hallucinations, refusals), and sentiment/toxicity. This design facilitates granular control over observability and simplifies the integration of LLM-specific metrics into existing ML observability pipelines.

Quick Start & Requirements

Highlighted Details

  • Offers metrics for text quality, relevance, security (jailbreaks, prompt injection, hallucinations, refusals), and sentiment/toxicity.
  • Benchmarks show significantly higher throughput for "LLM metrics" and "All metrics" on GPU instances (g4dn.xlarge) compared to CPU (c5.xlarge).
  • Designed for integration with the whylogs observability library.

Maintenance & Community

  • Developed by WhyLabs.
  • Community links are not explicitly provided in the README.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The README indicates a substantial performance drop when enabling "All metrics" on CPU instances, suggesting a strong dependency on GPU acceleration for comprehensive monitoring. Throughput for "All metrics" on a c5.xlarge instance is as low as 0.28 chats/sec.

Health Check
Last commit

8 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.