Discover and explore top open-source AI tools and projects—updated daily.
AI chatbot for personalized conversations via WhatsApp data
Top 71.3% on SourcePulse
This repository enables users to create a personalized AI chatbot clone by fine-tuning large language models, specifically Llama3 and Mistral, on their own WhatsApp chat history. It offers a streamlined process for preprocessing chat data, fine-tuning models using LoRa, and interacting with the resulting AI clone via a command-line interface.
How It Works
The project leverages the torchtune
library for efficient model fine-tuning and inference. It employs quantized LoRa (Low-Rank Adaptation) for parameter-efficient fine-tuning, significantly reducing VRAM requirements. WhatsApp chat exports are preprocessed into a ShareGPT format, with user roles assigned to conversation partners and the user's name designated as the "gpt" role for training.
Quick Start & Requirements
torchtune
library (pip install .
within the torchtune
directory).tune download
. Requires Hugging Face token and prior access request for Llama3..txt
files, place them in data/raw_data
, and run python preprocess.py "YOUR NAME"
. Regex adjustments in preprocess.py
may be needed based on region.tune run lora_finetune_single_device --config config/<model>/qlora_train_config.yaml
.tune run chat.py --config config/<model>/inference_config.yaml
. Mistral requires quantization first.Highlighted Details
torchtune
library for reduced VRAM usage (~30%) and a simpler codebase.Maintenance & Community
The project has been recently rewritten with torchtune
. No specific community links or contributor information are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The preprocessing script's regex may require adjustments for different WhatsApp export formats and regions. Group chat simulation is not currently supported and would require modifications to the preprocessing and torchtune
's ChatDataset
. Fine-tuning is stated to work best with English chats.
9 months ago
Inactive