Legal LLM for judicial practice and efficiency
Top 61.4% on sourcepulse
智海-录问 (wisdomInterrogatory) is a legal large language model developed by Zhejiang University, Alibaba DAMO Academy, and HuaYuan Computing. It aims to enhance judicial practice, digital case construction, and virtual legal consultation services by providing a foundational capability for intelligent legal systems. The model is designed for legal professionals, researchers, and students seeking advanced AI-powered legal assistance.
How It Works
The model is built upon the Baichuan-7B base, augmented with a two-stage training process: secondary pre-training on 40GB of legal data (documents, cases, Q&A) to inject domain knowledge, followed by instruction fine-tuning on 100k examples to enable conversational Q&A capabilities. It incorporates a sophisticated knowledge retrieval and fusion mechanism, utilizing both statistical and semantic features for multi-hop retrieval across various legal knowledge bases (statutes, cases, templates, books, exams, Q&A).
Quick Start & Requirements
transformers>=4.27.1
, datasets==2.13.0
, evaluate==0.4.0
, accelerate>=0.20.1
, deepspeed==0.9.5
, sentencepiece==0.1.99
. Training utilizes DeepSpeed and PyTorch.transformers>=4.27.1
, accelerate>=0.20.1
, torch>=2.0.1
, sentencepiece==0.1.99
. Inference is performed using Hugging Face Transformers and PyTorch, with CUDA support recommended.Highlighted Details
Maintenance & Community
The project is a collaborative effort between academic and industry institutions. Contact information for technical and academic collaboration is provided.
Licensing & Compatibility
The model is provided for academic research purposes and is generally not for commercial use. The README states, "本模型仅供学术研究之目的而提供,且原则上不可用于商业目的。" (This model is provided for academic research purposes only and, in principle, cannot be used for commercial purposes.)
Limitations & Caveats
The model is primarily for academic research and has restrictions on commercial use. The accuracy, completeness, or applicability of the generated content is not guaranteed, and users should exercise their own judgment and bear the risks associated with its use. Retrieval performance is noted as needing improvement.
1 year ago
1 week