Create a digital doppelgänger from your Twitter archive
Top 85.5% on sourcepulse
This project aims to create a personalized AI doppelgänger using a user's Twitter archive and LoRA fine-tuning. It's designed for individuals interested in digital immortality, content generation, and prompt engineering, offering a unique way to preserve and interact with a digital representation of oneself.
How It Works
The project processes a user's Twitter archive to generate an instruction-style JSON dataset. This dataset is then used to fine-tune a Chinese language model (currently ChatGLM) with LoRA. The approach prioritizes sampling for varied responses, aiming for higher interactivity than simple retrieval or Q&A systems. Advanced features include parsing replies for context and using OpenAI to augment training data with questions or preambles.
Quick Start & Requirements
pip install -r requirements.txt
conda install cudatoolkit=11.3
(or 11.8
for RTX 4090).Highlighted Details
Maintenance & Community
The project is an ongoing, fast-prototyping effort. Discussions are encouraged in the community channels (Discord/Slack links not provided in README).
Licensing & Compatibility
The project's licensing is not explicitly stated in the README, but it references projects with various licenses (e.g., MIT, Apache 2.0). Compatibility for commercial use or closed-source linking would require clarification.
Limitations & Caveats
The project is in early stages with fast prototyping. Parsing replies requires Selenium and a proxy pool, and cannot retrieve deleted tweets. The dialogue function is limited by the nature of tweet data (mostly declarative sentences).
2 years ago
Inactive