Discover and explore top open-source AI tools and projects—updated daily.
li-plusCreate a personalized chatbot from WeChat chat logs
Top 99.6% on SourcePulse
Chat4U enables users to train a personalized chatbot using their WeChat chat history. It targets individuals seeking to create a conversational AI that mimics their personal communication style, offering a unique way to leverage personal data for AI training. The primary benefit is a highly customized chatbot experience derived from one's own interactions.
How It Works
The project employs a multi-stage pipeline. First, WeChat chat data is extracted and decrypted. On macOS, this involves using wechat-decipher-macos to obtain database keys, followed by sqlcipher to decrypt msg_*.db files. For other OSes, alternative (and unverified) methods are suggested. Decrypted SQLite databases are then processed by prepare_data.py to generate a train.json dataset, currently supporting single-turn dialogues. This data is used to fine-tune a LLaMA-7B model using Stanford Alpaca's full fine-tuning approach with DeepSpeed zero3 on a GPU-enabled Linux machine.
Quick Start & Requirements
nalzok/wechat-decipher-macos (dtrace script), sqlcipher (via brew install sqlcipher), and Python 3.wechat-chatgpt and a Python environment for the OpenAI-compatible API server.wechat-decipher-macos: https://github.com/nalzok/wechat-decipher-macosstanford_alpaca: https://github.com/tatsu-lab/stanford_alpacaalpaca-lora: https://github.com/tloen/alpaca-lorawechat-chatgpt: https://github.com/holgots/wechat-chatgptHighlighted Details
wechat-chatgpt and an OpenAI-compatible API server for seamless usage.Maintenance & Community
This repository appears to be a personal project with no explicit mention of maintainers, community channels (like Discord/Slack), or a public roadmap.
Licensing & Compatibility
The README does not explicitly state a software license for the li-plus/chat4u project itself. However, it relies on other repositories like stanford_alpaca and alpaca-lora, which have their own licenses (typically Apache 2.0 or similar permissive licenses), and uses models like LLaMA, which may have specific usage restrictions. Compatibility for commercial use would require careful review of all underlying components and models.
Limitations & Caveats
The data extraction process is heavily reliant on macOS and specific WeChat versions, posing a significant barrier for users on other operating systems. The data preparation script currently only handles single-turn dialogues, and the resulting chatbot may exhibit common sense errors despite mimicking chat style. Training requires substantial GPU resources.
2 years ago
Inactive
redotvideo