LexiLaw  by CSHaitao

Chinese legal LLM for professional legal consultation

created 2 years ago
903 stars

Top 41.1% on sourcepulse

GitHubView on GitHub
Project Summary

LexiLaw is a fine-tuned Chinese legal large language model based on ChatGLM-6B, designed to provide professional legal consultation and share expertise in domain-specific LLM fine-tuning. It aims to assist legal professionals, students, and the general public with accurate and reliable legal advice, while fostering community development of similar specialized models.

How It Works

LexiLaw is fine-tuned on a diverse dataset combining general domain data (BELLE 1.5M), legal Q&A, legal statutes, reference books, and legal documents (like judgments). This multi-faceted approach, including the use of LawGPT_zh and Lawyer LLaMA datasets, prevents overfitting and enhances the model's ability to understand natural language and provide comprehensive legal interpretations. The project also offers a demo integrating knowledge base augmentation via Chinese-LangChain.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Model Download: Download pre-trained weights (LexiLaw_finetune, LexiLaw_Ptuningv2, or LexiLaw_LoRA) and place them in the /model directory.
  • Run: Execute python inference_method.py for command-line interaction.
  • Demo: For knowledge-enhanced interaction, clone demo/, install dependencies, download text2vec model to demo/text2vec/, and encoded knowledge bases to demo/cache/. Run python main.py.
  • Prerequisites: Python, PyTorch, DeepSpeed. Specific hardware requirements for training are mentioned (7x 40G A100).

Highlighted Details

  • Fine-tuned using LoRA, P-tuning v2, and full fine-tuning methods on ChatGLM-6B.
  • Demonstrates improved legal accuracy and citation of relevant laws compared to base ChatGLM.
  • Offers a knowledge-enhanced demo using Chinese-LangChain.
  • Includes SAILER, a structure-aware pre-trained language model for case retrieval, accepted by SIGIR2023.

Maintenance & Community

  • Actively seeking contributions via GitHub Issues and Pull Requests.
  • Encourages sharing of fine-tuning experiences.
  • Contact: liht22@mails.tsinghua.edu.cn for discussions.

Licensing & Compatibility

  • License: Not explicitly stated in the README, but the project references other open-source projects.
  • Restrictions: Explicitly states "仅供学术研究使用,严禁任何商业用途" (For academic research use only, strictly prohibited for any commercial use). Model output is not guaranteed for real legal scenarios.

Limitations & Caveats

  • Strictly for academic research; commercial use is prohibited.
  • Model output accuracy is not guaranteed and should not be used in real legal situations.
  • Future work includes RLHF training and legal knowledge pre-training.
Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
45 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.