EJDict by kujirahand

Comprehensive English-Japanese dictionary dataset

Created 10 years ago

264 stars

Top 96.5% on SourcePulse

Project Summary

Summary

EJDict-hand provides a comprehensive English-Japanese dictionary dataset under a Public Domain (CC0) license. It offers easily downloadable and testable data for developers and researchers working with bilingual lexicographical resources, eliminating copyright concerns and facilitating integration into various applications.

How It Works

The dataset comprises text files organized alphabetically, with a consolidated and sorted version available. Each entry follows a EnglishWord\tMeaning format, employing specific notations for synonyms, multiple meanings, grammatical forms (e.g., {形}, {動}), regional variations (e.g., 《米》, 《英》), and countability (e.g., 〈C〉, 〈U〉). PHP scripts are included for merging files and converting the data to SQLite format.

Quick Start & Requirements

Data can be downloaded directly from the project's website in text or SQLite formats: http://kujirahand.com/web-tools/EJDictFreeDL.php.
An online testing interface is also available: https://kujirahand.com/web-tools/EJDict.php.
The provided tools require PHP. No other specific software prerequisites are listed for data usage.

Highlighted Details

Released under Public Domain (CC0), offering maximum freedom for use and distribution.
Data has undergone significant AI-assisted correction and refinement in 2025 using tools like GitHub Copilot, Ollama (Qwen3, Gemma3n), supplementing prior manual efforts.
Features a rich notation system detailing word usage, grammatical roles, and regional differences.

Maintenance & Community

Corrections and improvements are welcomed via pull requests or email.
No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

License: Public Domain (CC0).
Fully compatible with commercial and closed-source applications due to its Public Domain status.

Limitations & Caveats

The dataset explicitly warns of the inclusion of "discriminatory expressions" which users are advised to avoid. While AI corrections enhance data quality, potential for AI-introduced inaccuracies exists, though recent updates focused on minor typos.

EJDict by kujirahand

Explore Similar Projects

corus by natasha

attranslate by fkirc

KeywordGacha by neavo

doctran by finic-ai

DocTranslator by mingchen666

Versatile-OCR-Program by raphael-seo

openwebtext by yet-another-account

pyspark-ai by pyspark-ai

NLPDataSet by liucongg

ebook-GPT-translator by jesselau76

AiNiee by NEKOparapa

JioNLP by dongrixinyu