lawqa_jp  by digital-go-jp

Japanese legal QA dataset for LLM validation

Created 6 months ago
258 stars

Top 98.1% on SourcePulse

GitHubView on GitHub
Project Summary

This dataset provides a multiple-choice question-answering resource for Japanese laws, generated and validated by multiple LLMs. It targets AI researchers and developers, offering a structured way to train and evaluate models on legal knowledge retrieval and understanding.

How It Works

The project offers a curated collection of 4-choice multiple-choice questions pertaining to Japanese legislation. Questions are derived from legal texts, with context, instructions, question, choices, and correct answers provided. A key feature is the use of Markdown formatting within the context to denote legal structure (laws, articles, sections) and string references to external legal documents, facilitating machine parsing. The dataset includes randomized versions of answer choices to mitigate model order dependency.

Quick Start & Requirements

No explicit installation or execution commands are provided in the README. Usage likely involves direct data loading and processing. Specific non-default prerequisites for utilizing the dataset are not detailed.

Highlighted Details

  • Dataset generated and validated using multiple Large Language Models (LLMs).
  • Data available in structured JSON and CSV formats.
  • Includes selection_randomized.json and selection_with_reference_randomized.json for evaluating model robustness against answer order and for tasks involving legal reference processing.
  • Context field employs Markdown for legal hierarchy (e.g., ## for law name, ### for article) and includes string references to external laws.

Maintenance & Community

The provided README does not contain information regarding notable contributors, sponsorships, partnerships, or community channels (e.g., Discord, Slack).

Licensing & Compatibility

The dataset is provided under the Public Data License v1.0 (https://www.digital.go.jp/resources/open-data/public_data_license_v1.0). This license governs the use of Japanese government open data.

Limitations & Caveats

The Q&A content is generated by LLMs and should not be construed as legal advice. Users must verify information against the latest official legal texts, as laws are subject to future amendments.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.