Discover and explore top open-source AI tools and projects—updated daily.
bespokelabsaiSynthetic data curation tool for post-training and structured data extraction
Top 26.9% on SourcePulse
Bespoke Curator is a Python library designed for generating and curating synthetic data, primarily for LLM post-training and structured data extraction. It offers a robust framework for creating high-quality datasets efficiently, catering to researchers and developers building and fine-tuning large language models.
How It Works
Curator leverages a Python-based pipeline approach, allowing users to define data generation steps using Pydantic models for structured outputs and custom curator.LLM classes. It integrates with various LLM providers via LiteLLM and vLLM, supporting asynchronous operations, caching, and fault recovery for scalable data generation. The library emphasizes structured output parsing and chaining LLM calls for complex data pipelines.
Quick Start & Requirements
pip install bespokelabs-curatorHighlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 months ago
1 day
lukehinds
datadreamer-dev
danielgross
minimaxir
meta-llama
argilla-io
yizhongw