wisdomInterrogatory  by zhihaiLLM

Legal LLM for judicial practice and efficiency

created 1 year ago
520 stars

Top 61.4% on sourcepulse

GitHubView on GitHub
Project Summary

智海-录问 (wisdomInterrogatory) is a legal large language model developed by Zhejiang University, Alibaba DAMO Academy, and HuaYuan Computing. It aims to enhance judicial practice, digital case construction, and virtual legal consultation services by providing a foundational capability for intelligent legal systems. The model is designed for legal professionals, researchers, and students seeking advanced AI-powered legal assistance.

How It Works

The model is built upon the Baichuan-7B base, augmented with a two-stage training process: secondary pre-training on 40GB of legal data (documents, cases, Q&A) to inject domain knowledge, followed by instruction fine-tuning on 100k examples to enable conversational Q&A capabilities. It incorporates a sophisticated knowledge retrieval and fusion mechanism, utilizing both statistical and semantic features for multi-hop retrieval across various legal knowledge bases (statutes, cases, templates, books, exams, Q&A).

Quick Start & Requirements

  • Training: Requires transformers>=4.27.1, datasets==2.13.0, evaluate==0.4.0, accelerate>=0.20.1, deepspeed==0.9.5, sentencepiece==0.1.99. Training utilizes DeepSpeed and PyTorch.
  • Inference: Requires transformers>=4.27.1, accelerate>=0.20.1, torch>=2.0.1, sentencepiece==0.1.99. Inference is performed using Hugging Face Transformers and PyTorch, with CUDA support recommended.
  • Model Weights: Available via Baidu Netdisk (link and password provided in README).
  • Knowledge Base Data: Available via Baidu Netdisk (link and password provided in README).

Highlighted Details

  • The model is fine-tuned on a diverse dataset covering legal consultations, scenario Q&A, crime prediction, sentencing prediction, court opinions, case summarization, legal exams, and multi-turn legal dialogues.
  • Features a knowledge retrieval system that fuses information from multiple legal knowledge bases (statutes, cases, templates, books, exams, Q&A) to enhance response accuracy.
  • Includes a detailed evaluation matrix for assessing judicial, general, and safety capabilities.
  • Future work includes improving retrieval models via contrastive learning and enhancing the model's understanding of fused knowledge.

Maintenance & Community

The project is a collaborative effort between academic and industry institutions. Contact information for technical and academic collaboration is provided.

Licensing & Compatibility

The model is provided for academic research purposes and is generally not for commercial use. The README states, "本模型仅供学术研究之目的而提供,且原则上不可用于商业目的。" (This model is provided for academic research purposes only and, in principle, cannot be used for commercial purposes.)

Limitations & Caveats

The model is primarily for academic research and has restrictions on commercial use. The accuracy, completeness, or applicability of the generated content is not guaranteed, and users should exercise their own judgment and bear the risks associated with its use. Retrieval performance is noted as needing improvement.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.