PersonalStyleAI-Framework-  by whyzhow

AI personalization framework for emulating user style

Created 2 months ago
362 stars

Top 77.5% on SourcePulse

GitHubView on GitHub
Project Summary

PersonalStyleAI-Framework- provides a complete pipeline for enabling AI to emulate a user's unique communication style. It targets individuals seeking personalized AI interactions, offering a seamless process from cleaning raw chat data to adapting models like GPT-4, Claude, and Llama 3, and finally solidifying stylistic nuances through lightweight fine-tuning.

How It Works

The framework operates in three core stages: "Data Alchemy" cleans messy chat records (emojis, links, spam) into high-quality JSONL conversation pairs using optimized regular expressions. The "Adaptor Centre" utilizes a factory pattern for a unified interface, allowing effortless switching between different AI models (e.g., OpenAI, Ollama, Claude, Llama 3) without altering business logic. "Style Evolution" employs local lightweight fine-tuning (LoRA via PEFT) to embed language habits directly into model weights, offering a more persistent personalization than prompt engineering alone.

Quick Start & Requirements

Basic installation involves cloning the repository, creating a Python virtual environment, and running pip install -e .. API keys must be configured in a .env file by copying .env.example. Data preprocessing is initiated via python preprocess_data.py using chat records in data/raw/chat.txt, followed by dialogue testing with python main.py. Advanced local fine-tuning requires a CUDA-enabled GPU and installing extra components with pip install -e ".[train]", then running python run_train.py. Key dependencies include Python and Git; CUDA is necessary for training.

Highlighted Details

  • End-to-end pipeline for AI style personalization.
  • Seamless multi-model adaptation (OpenAI, Anthropic, Meta, Ollama) via a unified, factory-patterned interface.
  • Local lightweight fine-tuning (LoRA) for persistent style integration into model weights.
  • Efficient, regex-based data cleaning for noisy social media chat logs.

Maintenance & Community

The project welcomes contributions via Pull Requests and Issues for feature requests and adding new AI adaptors. Specific community channels or contributor details are not provided in the README.

Licensing & Compatibility

The repository's license is not specified in the provided README, making its terms for commercial use or distribution unclear.

Limitations & Caveats

Local fine-tuning necessitates a CUDA-compatible GPU. The absence of a specified license poses a significant adoption blocker for commercial or closed-source integration. The data cleaning is optimized for chat records, potentially requiring adjustments for other text formats.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.