EJDict  by kujirahand

Comprehensive English-Japanese dictionary dataset

Created 10 years ago
254 stars

Top 99.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

EJDict-hand provides a comprehensive English-Japanese dictionary dataset under a Public Domain (CC0) license. It offers easily downloadable and testable data for developers and researchers working with bilingual lexicographical resources, eliminating copyright concerns and facilitating integration into various applications.

How It Works

The dataset comprises text files organized alphabetically, with a consolidated and sorted version available. Each entry follows a EnglishWord\tMeaning format, employing specific notations for synonyms, multiple meanings, grammatical forms (e.g., {形}, {動}), regional variations (e.g., 《米》, 《英》), and countability (e.g., 〈C〉, 〈U〉). PHP scripts are included for merging files and converting the data to SQLite format.

Quick Start & Requirements

  • Data can be downloaded directly from the project's website in text or SQLite formats: http://kujirahand.com/web-tools/EJDictFreeDL.php.
  • An online testing interface is also available: https://kujirahand.com/web-tools/EJDict.php.
  • The provided tools require PHP. No other specific software prerequisites are listed for data usage.

Highlighted Details

  • Released under Public Domain (CC0), offering maximum freedom for use and distribution.
  • Data has undergone significant AI-assisted correction and refinement in 2025 using tools like GitHub Copilot, Ollama (Qwen3, Gemma3n), supplementing prior manual efforts.
  • Features a rich notation system detailing word usage, grammatical roles, and regional differences.

Maintenance & Community

  • Corrections and improvements are welcomed via pull requests or email.
  • No explicit community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

  • License: Public Domain (CC0).
  • Fully compatible with commercial and closed-source applications due to its Public Domain status.

Limitations & Caveats

The dataset explicitly warns of the inclusion of "discriminatory expressions" which users are advised to avoid. While AI corrections enhance data quality, potential for AI-introduced inaccuracies exists, though recent updates focused on minor typos.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.