Dataset tool for LLM fine-tuning
Top 5.2% on sourcepulse
This project provides a specialized application for creating fine-tuning datasets for Large Language Models (LLMs). It targets users who need to transform domain-specific knowledge into structured training data for LLM APIs, offering an intuitive interface for document processing, question generation, and data export.
How It Works
Easy Dataset leverages intelligent document processing to split uploaded Markdown files into meaningful segments. It then uses LLM APIs to generate questions from these segments and subsequently generate comprehensive answers. The application supports flexible editing of all generated content and offers multiple export formats like Alpaca and ShareGPT in JSON or JSONL.
Quick Start & Requirements
docker build -t easy-dataset .
then docker run -d -p 1717:1717 -v {YOUR_LOCAL_DB_PATH}:/app/local-db --name easy-dataset easy-dataset
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project relies on external LLM APIs for question and answer generation, meaning the quality and cost are dependent on the chosen LLM provider.
2 weeks ago
1 day