finetune_dataset_maker by huang1332

Dataset maker for ChatGLM finetuning

Created 2 years ago

608 stars

Top 54.0% on SourcePulse

Project Summary

This tool assists users in creating custom datasets for fine-tuning ChatGLM models, enabling personalized AI responses. It's designed for users who want to generate or curate question-answer pairs for model training, particularly for creating specialized conversational agents.

How It Works

The tool provides a web interface built with Streamlit. Users input questions and can either manually write answers or leverage GPT's API to generate responses. The generated question-answer pairs are then saved in a JSON format compatible with several popular ChatGLM fine-tuning projects.

Quick Start & Requirements

Install: pip install openai==0.28.0 streamlit
Run: streamlit run dataset.py --server.port 2323
Prerequisites: Python 3.x, OpenAI API key (if using GPT generation).
Resources: Requires a web browser and potentially an OpenAI API key.

Highlighted Details

Supports multiple ChatGLM fine-tuning project formats.
Allows manual answer input or GPT-generated answers.
Saves progress for later continuation.
Exports data in JSON format suitable for fine-tuning.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project explicitly requires an older version of the openai package (0.28.0) due to API changes, which may pose compatibility issues with newer projects. The README does not specify the project's license.

finetune_dataset_maker by huang1332

Explore Similar Projects

txtchat by neuml

duckduckgpt by KudoAI

dialogbot by shibing624

awesome-chatgpt-project by xianyu110

awesome-gpt by awesome-gptX

chatbot-api by fuzhengwei

dialoqbase by n4ze3m

awesome-chatgpt by sindresorhus

vscode-chatgpt by gencay

chathub by chathub-dev

Qwen by QwenLM

LibreChat by danny-avila