giskard  by Giskard-AI

Open-source testing framework for AI & LLM systems

created 3 years ago
4,740 stars

Top 10.7% on sourcepulse

GitHubView on GitHub
Project Summary

Giskard is an open-source Python framework for evaluating and testing AI systems, including LLM-based applications like RAG agents and traditional ML models. It aims to identify and mitigate risks related to performance, bias, and security vulnerabilities, offering automated scanning and dataset generation for comprehensive quality assurance.

How It Works

Giskard automates the detection of issues such as hallucinations, prompt injection, and discrimination by analyzing model outputs against predefined or generated test cases. For RAG applications, its RAG Evaluation Toolkit (RAGET) can automatically generate question-answer pairs and relevant contexts from a knowledge base, enabling detailed evaluation of RAG components like the generator, retriever, and knowledge base itself.

Quick Start & Requirements

  • Install via pip: pip install "giskard[llm]" -U
  • Supported Python versions: 3.9, 3.10, 3.11.
  • For RAG evaluation, requires libraries like langchain, langchain-openai, tiktoken, and pypdf.
  • Example Colab notebook available.

Highlighted Details

  • Detects a wide range of LLM issues including hallucinations, harmful content, prompt injection, and bias.
  • RAGET automatically generates evaluation datasets and scores RAG components (Generator, Retriever, Rewriter, Router, Knowledge Base).
  • Integrates with any model and environment, with a separate library giskard-vision for computer vision tasks.
  • Provides a giskard.scan() function for automated issue detection and scan_results.generate_test_suite() for creating test suites.

Maintenance & Community

  • Active community on Discord.
  • Open to contributions with a contribution guide.
  • Sponsorships available via GitHub, with current sponsors including Lunary and Biolevate.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • The RAGET testset generation can be time-consuming depending on the number of questions requested.
  • While supporting Python 3.9-3.11, newer Python versions might not be immediately compatible.
Health Check
Last commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)
9
Issues (30d)
1
Star History
255 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.