ragas  by explodinggradients

Toolkit for LLM application evaluation

Created 2 years ago
10,785 stars

Top 4.6% on SourcePulse

GitHubView on GitHub
Project Summary

Ragas is an open-source toolkit designed to evaluate and optimize Large Language Model (LLM) applications. It provides objective metrics, automated test data generation, and seamless integrations with popular LLM frameworks, enabling data-driven insights and feedback loops for continuous improvement. The target audience includes developers and researchers building and deploying LLM-powered applications.

How It Works

Ragas employs a combination of LLM-based and traditional metrics for precise evaluation. It can automatically generate comprehensive test datasets covering diverse scenarios, reducing the need for manual test case creation. The framework integrates smoothly with tools like LangChain, facilitating a unified workflow for development and evaluation.

Quick Start & Requirements

  • Install via pip: pip install ragas
  • Requires Python and an LLM API key (e.g., OpenAI).
  • A complete quickstart guide and documentation are available.

Highlighted Details

  • Offers objective metrics for LLM evaluation.
  • Features automated test data generation capabilities.
  • Integrates with popular LLM frameworks and observability tools.
  • Supports building feedback loops using production data.

Maintenance & Community

The project welcomes community contributions and provides a Discord server for engagement. An opt-out option is available for anonymized usage data collection.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text.

Limitations & Caveats

The README does not detail specific limitations or known issues. The project's license is not clearly specified, which may impact commercial use or closed-source integration.

Health Check
Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)
117
Issues (30d)
37
Star History
412 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Travis Fischer Travis Fischer(Founder of Agentic), and
6 more.

AlphaCodium by Codium-ai

0.1%
4k
Code generation research paper implementation
Created 1 year ago
Updated 9 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

human-eval by openai

0.4%
3k
Evaluation harness for LLMs trained on code
Created 4 years ago
Updated 8 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Meng Zhang Meng Zhang(Cofounder of TabbyML), and
3 more.

qodo-cover by qodo-ai

0.2%
5k
CLI tool for AI-powered test generation and code coverage enhancement
Created 1 year ago
Updated 2 months ago
Starred by Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
14 more.

SWE-bench by SWE-bench

2.3%
4k
Benchmark for evaluating LLMs on real-world GitHub issues
Created 1 year ago
Updated 18 hours ago
Feedback? Help us improve.