LaWGPT  by pengxiao-song

Chinese LLaMA tuned for legal use

created 2 years ago
5,997 stars

Top 8.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

LaWGPT is an open-source large language model series specifically tuned for Chinese legal knowledge. It aims to enhance understanding and execution capabilities within the legal domain for researchers and legal professionals. The project provides foundational models and instruction-tuned variants, offering improved semantic comprehension and task performance on legal texts.

How It Works

LaWGPT builds upon general Chinese LLMs like Chinese-LLaMA and ChatGLM. It expands the legal domain vocabulary and pre-trains on extensive Chinese legal corpora. Subsequently, it undergoes instruction fine-tuning using curated legal dialogue Q&A datasets and Chinese judicial examination datasets, improving its ability to understand and respond to legal queries.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt within a conda environment (Python 3.10 recommended).
  • Prerequisites: Requires access to base LLaMA or Chinese-LLaMA model weights, as only LoRA weights are released due to licensing.
  • Usage: Supports both a web UI (scripts/webui.sh) and command-line inference (scripts/infer.sh).
  • Resources: Refer to the technical report for detailed information.

Highlighted Details

  • Offers both a legal base model (Legal-Base-7B) and dialogue models (LaWGPT-7B-beta variants).
  • Utilizes data from Chinese legal documents and judicial exams, including self-instruct and knowledge-guided generation methods.
  • Training involves two stages: domain-specific pre-training and instruction fine-tuning.
  • Demonstrates capabilities in answering legal questions and generating case descriptions.

Maintenance & Community

The project is supported by the Nanjing University Machine Learning and Data Mining Research Group. Collaboration is encouraged via GitHub Issues.

Licensing & Compatibility

All resources are strictly for academic research use and are prohibited from commercial use. The project does not guarantee the accuracy of model outputs and strictly forbids their use in real legal scenarios.

Limitations & Caveats

Due to resource and data limitations, the models may exhibit weaker memory and language capabilities, potentially leading to factual inaccuracies. Initial human intent alignment is preliminary, meaning outputs might be unpredictable, harmful, or misaligned with human preferences. Self-awareness and Chinese comprehension require further enhancement.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
46 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.