whatsapp-ai-clone by kinggongzilla

AI chatbot for personalized conversations via WhatsApp data

Created 2 years ago

412 stars

Top 70.9% on SourcePulse

Project Summary

This repository enables users to create a personalized AI chatbot clone by fine-tuning large language models, specifically Llama3 and Mistral, on their own WhatsApp chat history. It offers a streamlined process for preprocessing chat data, fine-tuning models using LoRa, and interacting with the resulting AI clone via a command-line interface.

How It Works

The project leverages the torchtune library for efficient model fine-tuning and inference. It employs quantized LoRa (Low-Rank Adaptation) for parameter-efficient fine-tuning, significantly reducing VRAM requirements. WhatsApp chat exports are preprocessed into a ShareGPT format, with user roles assigned to conversation partners and the user's name designated as the "gpt" role for training.

Quick Start & Requirements

Install: Clone the repository, install PyTorch, and install the modified torchtune library (pip install . within the torchtune directory).
Base Model: Download Llama3-8B-Instruct or Mistral-7B-Instruct-v0.2 using tune download. Requires Hugging Face token and prior access request for Llama3.
Data: Export WhatsApp chats as .txt files, place them in data/raw_data, and run python preprocess.py "YOUR NAME". Regex adjustments in preprocess.py may be needed based on region.
Fine-tuning: Run tune run lora_finetune_single_device --config config/<model>/qlora_train_config.yaml.
Inference: Run tune run chat.py --config config/<model>/inference_config.yaml. Mistral requires quantization first.
Hardware: Approximately 16 GB VRAM for Llama3 fine-tuning with 4k context.

Highlighted Details

Utilizes torchtune library for reduced VRAM usage (~30%) and a simpler codebase.
Supports Llama3-8B-Instruct and Mistral-7B-Instruct-v0.2.
Fine-tuning via quantized LoRa.
Preprocessing script handles WhatsApp chat export conversion.

Maintenance & Community

The project has been recently rewritten with torchtune. No specific community links or contributor information are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The preprocessing script's regex may require adjustments for different WhatsApp export formats and regions. Group chat simulation is not currently supported and would require modifications to the preprocessing and torchtune's ChatDataset. Fine-tuning is stated to work best with English chats.

whatsapp-ai-clone by kinggongzilla

Explore Similar Projects

chatGPT-Siri-Pro by ClarenceDan

ata by transformrs

GPTPortal by Zaki-1052

WeChatBot_WXAUTO_SE by iwyxdxl

Whatsapp-Ai-BOT by yesbhautik

baize-chatbot by project-baize

bilibot by linyiLYi

chatgpt-web by Niek

Wa-OpenAI by Sansekai

whatsapp-chatgpt by askrella

Qwen by QwenLM

wechat-chatgpt by fuergaosi233