wisdomInterrogatory by zhihaiLLM

Legal LLM for judicial practice and efficiency

Created 2 years ago

537 stars

Top 59.2% on SourcePulse

Project Summary

智海-录问 (wisdomInterrogatory) is a legal large language model developed by Zhejiang University, Alibaba DAMO Academy, and HuaYuan Computing. It aims to enhance judicial practice, digital case construction, and virtual legal consultation services by providing a foundational capability for intelligent legal systems. The model is designed for legal professionals, researchers, and students seeking advanced AI-powered legal assistance.

How It Works

The model is built upon the Baichuan-7B base, augmented with a two-stage training process: secondary pre-training on 40GB of legal data (documents, cases, Q&A) to inject domain knowledge, followed by instruction fine-tuning on 100k examples to enable conversational Q&A capabilities. It incorporates a sophisticated knowledge retrieval and fusion mechanism, utilizing both statistical and semantic features for multi-hop retrieval across various legal knowledge bases (statutes, cases, templates, books, exams, Q&A).

Quick Start & Requirements

Training: Requires transformers>=4.27.1, datasets==2.13.0, evaluate==0.4.0, accelerate>=0.20.1, deepspeed==0.9.5, sentencepiece==0.1.99. Training utilizes DeepSpeed and PyTorch.
Inference: Requires transformers>=4.27.1, accelerate>=0.20.1, torch>=2.0.1, sentencepiece==0.1.99. Inference is performed using Hugging Face Transformers and PyTorch, with CUDA support recommended.
Model Weights: Available via Baidu Netdisk (link and password provided in README).
Knowledge Base Data: Available via Baidu Netdisk (link and password provided in README).

Highlighted Details

The model is fine-tuned on a diverse dataset covering legal consultations, scenario Q&A, crime prediction, sentencing prediction, court opinions, case summarization, legal exams, and multi-turn legal dialogues.
Features a knowledge retrieval system that fuses information from multiple legal knowledge bases (statutes, cases, templates, books, exams, Q&A) to enhance response accuracy.
Includes a detailed evaluation matrix for assessing judicial, general, and safety capabilities.
Future work includes improving retrieval models via contrastive learning and enhancing the model's understanding of fused knowledge.

Maintenance & Community

The project is a collaborative effort between academic and industry institutions. Contact information for technical and academic collaboration is provided.

Licensing & Compatibility

The model is provided for academic research purposes and is generally not for commercial use. The README states, "本模型仅供学术研究之目的而提供，且原则上不可用于商业目的。" (This model is provided for academic research purposes only and, in principle, cannot be used for commercial purposes.)

Limitations & Caveats

The model is primarily for academic research and has restrictions on commercial use. The accuracy, completeness, or applicability of the generated content is not guaranteed, and users should exercise their own judgment and bear the risks associated with its use. Retrieval performance is noted as needing improvement.

wisdomInterrogatory by zhihaiLLM

Explore Similar Projects

lawqa_jp by digital-go-jp

fuzi.mingcha by irlab-sdu

LLM-and-Law by Jeryi-Sun

awesome-legal-nlp by maastrichtlawtech

legal-ml-datasets by neelguha

LegalPapers by thunlp

LawBench by open-compass

legalbench by HazyResearch

DISC-LawLLM by FudanDISC

lawyer-llama by AndrewZhe

LexiLaw by CSHaitao

ChatLaw by PKU-YuanGroup