WeClone  by xming521

Digital twin one-stop solution

created 1 year ago
15,170 stars

Top 3.4% on sourcepulse

GitHubView on GitHub
Project Summary

WeClone offers a comprehensive solution for creating digital personas by fine-tuning large language models (LLMs) with personal chat history. It targets users who want to imbue AI with their unique communication style or preserve digital legacies, enabling personalized chatbots and voice cloning.

How It Works

The project leverages the LLaMA Factory framework for fine-tuning LLMs, specifically using LoRA or QLoRA techniques for efficient parameter adaptation. Chat data, extracted via PyWxDump and preprocessed to remove PII and filter content, is used to train models like Qwen2.5-7B-Instruct. A separate module, WeClone-audio, handles voice cloning using smaller models and WeChat voice messages. The fine-tuned models can then be deployed via an API service and integrated with various chatbot platforms like AstrBot.

Quick Start & Requirements

  • Install dependencies using uv venv .venv --python=3.10 and uv pip install --group main -e ..
  • Requires Python 3.10+, PyTorch with CUDA support.
  • Recommended: 16GB+ VRAM for 7B models with LoRA/QLoRA.
  • Data extraction requires PyWxDump.
  • Official Docs: https://github.com/xming521/WeClone

Highlighted Details

  • Full pipeline for digital persona creation: data export, preprocessing, model training, and deployment.
  • Supports WeChat chat history fine-tuning for LLMs and voice cloning via WeClone-audio.
  • Integrates with multiple chatbot platforms (WeChat, Telegram, QQ, etc.) via an API service.
  • Offers data preprocessing to remove PII and filter content using a customizable blocked words list.

Maintenance & Community

  • Project is under active development with recent refactoring in v0.2.0.
  • Open to Issues and Pull Requests; discussions for new features are encouraged via Issues.
  • Development dependencies include pytest, pyright, and ruff.

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • The extensive disclaimer warns against illegal use, privacy theft, and requires deletion of code within 24 hours of download, implying a non-commercial, personal-use-only restriction.

Limitations & Caveats

  • The project is in a rapid iteration phase, and current results are not final.
  • Windows environment is not strictly tested; WSL is recommended.
  • Fine-tuning effectiveness heavily depends on model size and data quantity/quality.
  • Tool calling capabilities are disabled after fine-tuning; system prompts must be manually configured in integrated bots.
Health Check
Last commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
10
Star History
12,640 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.