Chatbot for emulating a character's speech from a TV drama via LoRA fine-tuning
Top 49.3% on sourcepulse
This project provides a complete workflow for fine-tuning large language models (LLMs) on custom script data to create character-specific chatbots. It targets users interested in personalized AI experiences, particularly those familiar with Chinese historical dramas, by enabling them to replicate the speech patterns of characters like Zhen Huan from the popular TV series "Empresses in the Palace."
How It Works
The project leverages LoRA (Low-Rank Adaptation) fine-tuning on a base LLM (specifically Llama 3.1 8B Instruct is demonstrated) using dialogue data extracted from scripts. The core innovation lies in the provided data processing pipeline, which handles converting raw script text into a structured conversational format suitable for fine-tuning. This includes extracting character-dialogue pairs and formatting them into instruction-response datasets, with suggestions for data augmentation.
Quick Start & Requirements
pip install modelscope transformers accelerate peft datasets
Highlighted Details
Maintenance & Community
The project has received recognition in AI competitions, including the 2023 iFlytek Spark Cup and the 2024 OpenBMB Challenge. Contributors are listed as members of Datawhale.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project focuses on Chinese script data and may require significant adaptation for other languages or data formats. The effectiveness of the data extraction and augmentation steps can vary depending on the quality and structure of the input script.
2 months ago
1 day