ragas by vibrantlabsai

Toolkit for LLM application evaluation

Created 2 years ago

12,699 stars

Top 3.9% on SourcePulse

17 Experts Love This Project

gregpr07

Cofounder of Browser Use

alexchen4ai

Cofounder of Nexa AI

hiyouga

Author of LLaMA-Factory

nirga

Cofounder of Traceloop

and 13 more!

Project Summary

Ragas is an open-source toolkit designed to evaluate and optimize Large Language Model (LLM) applications. It provides objective metrics, automated test data generation, and seamless integrations with popular LLM frameworks, enabling data-driven insights and feedback loops for continuous improvement. The target audience includes developers and researchers building and deploying LLM-powered applications.

How It Works

Ragas employs a combination of LLM-based and traditional metrics for precise evaluation. It can automatically generate comprehensive test datasets covering diverse scenarios, reducing the need for manual test case creation. The framework integrates smoothly with tools like LangChain, facilitating a unified workflow for development and evaluation.

Quick Start & Requirements

Install via pip: pip install ragas
Requires Python and an LLM API key (e.g., OpenAI).
A complete quickstart guide and documentation are available.

Highlighted Details

Offers objective metrics for LLM evaluation.
Features automated test data generation capabilities.
Integrates with popular LLM frameworks and observability tools.
Supports building feedback loops using production data.

Maintenance & Community

The project welcomes community contributions and provides a Discord server for engagement. An opt-out option is available for anonymized usage data collection.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text.

Limitations & Caveats

The README does not detail specific limitations or known issues. The project's license is not clearly specified, which may impact commercial use or closed-source integration.

Health Check

Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)

21

Issues (30d)

13

Star History

346 stars in the last 30 days

Explore Similar Projects

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind).

codebase-digest by kamilstanuch

CLI tool for LLM-assisted codebase analysis

Created 1 year ago

Updated 1 year ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI) and

Rahul Behal

Rahul Behal(Cofounder of Gumloop).

continuous-eval by relari-ai

Open-source package for data-driven LLM application evaluation

Created 2 years ago

Updated 1 year ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind).

bench by arthur-ai

LLM evaluation tool for production use cases

Created 2 years ago

Updated 1 year ago

testpilot by githubnext

Unit test generator for npm packages using LLMs

Created 2 years ago

Updated 1 year ago

LLM4SoftwareTesting by LLM-Testing

Collection of papers on LLMs in software testing

Created 2 years ago

Updated 2 years ago

Starred by

Binyuan Hui

Binyuan Hui(Research Scientist at Alibaba Qwen),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

3 more.

LiveCodeBench by LiveCodeBench

Benchmark for holistic LLM code evaluation

Created 1 year ago

Updated 7 months ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Bryan Helmig

Bryan Helmig(Cofounder of Zapier), and

7 more.

ChainForge by ianarawjo

Visual environment for LLM prompt battle-testing

Created 2 years ago

Updated 1 month ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Travis Fischer

Travis Fischer(Founder of Agentic), and

6 more.

AlphaCodium by Codium-ai

Code generation research paper implementation

Created 2 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Jared Palmer

Jared Palmer(SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX), and

3 more.

human-eval by openai

Evaluation harness for LLMs trained on code

Created 4 years ago

Updated 1 year ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Meng Zhang

Meng Zhang(Cofounder of TabbyML), and

3 more.

qodo-cover by qodo-ai

CLI tool for AI-powered test generation and code coverage enhancement

Created 1 year ago

Updated 8 months ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

15 more.

SWE-bench by SWE-bench

Benchmark for evaluating LLMs on real-world GitHub issues

Created 2 years ago

Updated 1 week ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB),

Travis Fischer

Travis Fischer(Founder of Agentic), and

16 more.

promptfoo by promptfoo

CLI tool for LLM prompt/agent/RAG testing and red-teaming

Created 2 years ago

Updated 22 hours ago

Feedback? Help us improve.