Discover and explore top open-source AI tools and projects—updated daily.
superzhang21Linguistic feature datasets for LLM enhancement
Top 97.6% on SourcePulse
Summary
This project, ghostwriter (影子作家), offers a collection of JSON files containing structured linguistic and stylistic features extracted from various individuals and fictional characters. It targets researchers, writers, and AI developers aiming to analyze, replicate, or condition language models on specific authorial voices, enabling more nuanced AI-generated text.
How It Works
Linguistic features are extracted from diverse sources (social media, literature, speeches) and organized into JSON files named by source and initials (e.g., Weibo_Hu.json). This structured data is designed for integration with long-context large language models (LLMs) to effectively adopt or analyze specific writing styles.
Quick Start & Requirements
JSON files reside in the data/ directory. Usage involves integrating these files with compatible long-context LLMs. The README does not specify installation commands, non-default prerequisites (GPU, CUDA, Python versions), or setup time estimates.
Highlighted Details
Weibo_Hu.json (Hu Xijin from Weibo), Tiandao_DingYuanying.json (Ding Yuanying from "Tiandao"), Public_LuXun.json (Lu Xun from public data, potentially differing from popular perception).Maintenance & Community
Direct contributions (PRs) are not accepted. Suggestions/issues can be raised but lack guaranteed response or action. Users can submit data for specific feature extraction. Contact: null@linux.do. No community channels are listed.
Licensing & Compatibility
Licensed under CC BY-NC-ND 4.0. Requires attribution, prohibits commercial use, and forbids distribution of modified versions. Suitable for non-commercial research and analysis.
Limitations & Caveats
Direct contributions are not accepted, limiting community involvement. User-submitted suggestions/issues may not be addressed. The "NoDerivatives" license clause restricts creating and distributing derivative works.
8 months ago
Inactive