synthesizer by SciPhi-AI

LLM framework for RAG and data creation

Created 2 years ago

629 stars

Top 52.6% on SourcePulse

View on GitHub

8 Experts Love This Project

Vasek Mlejnsky

Cofounder of E2B

Shyamal Anadkat

Research Scientist at OpenAI

Omar Sanseviero

DevRel at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

and 4 more!

Project Summary

Synthesizer[ΨΦ] is a Python framework designed for generating synthetic data and implementing Retrieval-Augmented Generation (RAG) pipelines. It targets developers and researchers needing to create custom datasets for LLM training or RAG systems, and those looking to quickly evaluate RAG performance against real-world data sources. The framework offers integrated RAG capabilities and supports multiple LLM providers.

How It Works

Synthesizer employs a modular architecture, allowing users to integrate various LLM providers (Anthropic, OpenAI, vLLM, HuggingFace, SciPhi) and RAG providers (e.g., Agent Search API). It facilitates custom data creation by leveraging LLMs to generate tailored datasets, and enables RAG pipeline evaluation through its rag_harness script, which can benchmark performance against specified data sources and LLM configurations.

Quick Start & Requirements

Primary install: pip install sciphi-synthesizer
Prerequisites: SCIPHI_API_KEY environment variable.
Documentation: Synthesizer Documentation
Community: Discord

Highlighted Details

Supports custom data generation for LLM training and RAG.
Built-in RAG provider interface with turnkey integration for Agent Search API.
Enables RAG pipeline performance evaluation.
Offers integration with multiple LLM providers including OpenAI, Anthropic, vLLM, and HuggingFace.

Maintenance & Community

The project actively engages its community via Discord and provides email support for inquiries.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README.

Limitations & Caveats

The README mentions the requirement for a SCIPHI_API_KEY, suggesting potential reliance on proprietary services or specific configurations. The framework appears to be in active development, with specific RAG provider integrations like "agent-search" highlighted.

Health Check

Last Commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days