Discover and explore top open-source AI tools and projects—updated daily.
Corpus for Chinese names and name generation
Top 11.8% on SourcePulse
This repository provides a comprehensive corpus of Chinese names, designed for natural language processing tasks such as Chinese word segmentation and named entity recognition. It also includes name generation capabilities and datasets for English and Japanese names, catering to researchers and developers working with multilingual name data.
How It Works
The project leverages big data and NLP techniques, processing massive text datasets to extract and clean name entities. It builds a large-scale Chinese name knowledge graph with over 56 million entries, enriched with attributes like gender, age, and sentiment. The corpus is derived from extensive data cleaning of billions of names, aiming for high accuracy in NLP applications.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project was last updated on March 27, 2024. The primary contributor is "@萌名NameMoe". The README mentions the project is maintained for learning NLP, KG, and AI technologies.
Licensing & Compatibility
The repository does not explicitly state a license. Users are requested to set downloads to 0积分 and retain the GitHub link if reposting on domestic platforms.
Limitations & Caveats
While data is cleaned, the README notes the presence of "badcase" entries in several datasets, including Chinese relationship terms and translated English names. The project is primarily a data resource, with limited explicit tooling beyond the name generator.
1 year ago
Inactive