langkit by whylabs

Open-source toolkit for monitoring LLMs

Created 2 years ago

976 stars

Top 37.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Marc Klingen

Cofounder of Langfuse

Max Deichmann

Cofounder of Langfuse

Project Summary

LangKit is an open-source toolkit designed for monitoring Large Language Models (LLMs) by extracting key signals from prompts and responses. It targets ML engineers and researchers working with LLMs in production, providing observability into text quality, relevance, security, and sentiment to mitigate risks associated with unpredictable model behavior.

How It Works

LangKit integrates seamlessly with the whylogs data logging library, offering User-Defined Functions (UDFs) that automatically enhance text feature logging. It employs a modular approach, allowing users to select specific metric categories like text quality (readability, complexity), relevance (similarity to themes), security (jailbreaks, prompt injection, hallucinations, refusals), and sentiment/toxicity. This design facilitates granular control over observability and simplifies the integration of LLM-specific metrics into existing ML observability pipelines.

Quick Start & Requirements

Install via PyPI: pip install langkit[all]
Requires Python and whylogs.
Official notebook example: https://github.com/whylabs/langkit/blob/main/notebooks/langkit-quickstart.ipynb

Highlighted Details

Offers metrics for text quality, relevance, security (jailbreaks, prompt injection, hallucinations, refusals), and sentiment/toxicity.
Benchmarks show significantly higher throughput for "LLM metrics" and "All metrics" on GPU instances (g4dn.xlarge) compared to CPU (c5.xlarge).
Designed for integration with the whylogs observability library.

Maintenance & Community

Developed by WhyLabs.
Community links are not explicitly provided in the README.

Licensing & Compatibility

License: Apache 2.0.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

The README indicates a substantial performance drop when enabling "All metrics" on CPU instances, suggesting a strong dependency on GPU acceleration for comprehensive monitoring. Throughput for "All metrics" on a c5.xlarge instance is as low as 0.28 chats/sec.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days