finetune_dataset_maker  by huang1332

Dataset maker for ChatGLM finetuning

created 2 years ago
608 stars

Top 54.7% on sourcepulse

GitHubView on GitHub
Project Summary

This tool assists users in creating custom datasets for fine-tuning ChatGLM models, enabling personalized AI responses. It's designed for users who want to generate or curate question-answer pairs for model training, particularly for creating specialized conversational agents.

How It Works

The tool provides a web interface built with Streamlit. Users input questions and can either manually write answers or leverage GPT's API to generate responses. The generated question-answer pairs are then saved in a JSON format compatible with several popular ChatGLM fine-tuning projects.

Quick Start & Requirements

  • Install: pip install openai==0.28.0 streamlit
  • Run: streamlit run dataset.py --server.port 2323
  • Prerequisites: Python 3.x, OpenAI API key (if using GPT generation).
  • Resources: Requires a web browser and potentially an OpenAI API key.

Highlighted Details

  • Supports multiple ChatGLM fine-tuning project formats.
  • Allows manual answer input or GPT-generated answers.
  • Saves progress for later continuation.
  • Exports data in JSON format suitable for fine-tuning.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The repository's license is not specified in the README. Compatibility for commercial use or closed-source linking is undetermined.

Limitations & Caveats

The project explicitly requires an older version of the openai package (0.28.0) due to API changes, which may pose compatibility issues with newer projects. The README does not specify the project's license.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.