giskard-oss by Giskard-AI

Open-source testing framework for AI & LLM systems

Created 3 years ago

5,063 stars

Top 9.8% on SourcePulse

View on GitHub

8 Experts Love This Project

Tim J. Baek

Founder of Open WebUI

Author of "AI Engineering", "Designing Machine Learning Systems"

and 4 more!

Project Summary

Giskard is an open-source Python framework for evaluating and testing AI systems, including LLM-based applications like RAG agents and traditional ML models. It aims to identify and mitigate risks related to performance, bias, and security vulnerabilities, offering automated scanning and dataset generation for comprehensive quality assurance.

How It Works

Giskard automates the detection of issues such as hallucinations, prompt injection, and discrimination by analyzing model outputs against predefined or generated test cases. For RAG applications, its RAG Evaluation Toolkit (RAGET) can automatically generate question-answer pairs and relevant contexts from a knowledge base, enabling detailed evaluation of RAG components like the generator, retriever, and knowledge base itself.

Quick Start & Requirements

Install via pip: pip install "giskard[llm]" -U
Supported Python versions: 3.9, 3.10, 3.11.
For RAG evaluation, requires libraries like langchain, langchain-openai, tiktoken, and pypdf.
Example Colab notebook available.

Highlighted Details

Detects a wide range of LLM issues including hallucinations, harmful content, prompt injection, and bias.
RAGET automatically generates evaluation datasets and scores RAG components (Generator, Retriever, Rewriter, Router, Knowledge Base).
Integrates with any model and environment, with a separate library giskard-vision for computer vision tasks.
Provides a giskard.scan() function for automated issue detection and scan_results.generate_test_suite() for creating test suites.

Maintenance & Community

Active community on Discord.
Open to contributions with a contribution guide.
Sponsorships available via GitHub, with current sponsors including Lunary and Biolevate.

Licensing & Compatibility

Licensed under Apache 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The RAGET testset generation can be time-consuming depending on the number of questions requested.
While supporting Python 3.9-3.11, newer Python versions might not be immediately compatible.

Health Check

Last Commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

58 stars in the last 30 days