LaWGPT by pengxiao-song

Chinese LLaMA tuned for legal use

Created 2 years ago

6,024 stars

Top 8.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

LaWGPT is an open-source large language model series specifically tuned for Chinese legal knowledge. It aims to enhance understanding and execution capabilities within the legal domain for researchers and legal professionals. The project provides foundational models and instruction-tuned variants, offering improved semantic comprehension and task performance on legal texts.

How It Works

LaWGPT builds upon general Chinese LLMs like Chinese-LLaMA and ChatGLM. It expands the legal domain vocabulary and pre-trains on extensive Chinese legal corpora. Subsequently, it undergoes instruction fine-tuning using curated legal dialogue Q&A datasets and Chinese judicial examination datasets, improving its ability to understand and respond to legal queries.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt within a conda environment (Python 3.10 recommended).
Prerequisites: Requires access to base LLaMA or Chinese-LLaMA model weights, as only LoRA weights are released due to licensing.
Usage: Supports both a web UI (scripts/webui.sh) and command-line inference (scripts/infer.sh).
Resources: Refer to the technical report for detailed information.

Highlighted Details

Offers both a legal base model (Legal-Base-7B) and dialogue models (LaWGPT-7B-beta variants).
Utilizes data from Chinese legal documents and judicial exams, including self-instruct and knowledge-guided generation methods.
Training involves two stages: domain-specific pre-training and instruction fine-tuning.
Demonstrates capabilities in answering legal questions and generating case descriptions.

Maintenance & Community

The project is supported by the Nanjing University Machine Learning and Data Mining Research Group. Collaboration is encouraged via GitHub Issues.

Licensing & Compatibility

All resources are strictly for academic research use and are prohibited from commercial use. The project does not guarantee the accuracy of model outputs and strictly forbids their use in real legal scenarios.

Limitations & Caveats

Due to resource and data limitations, the models may exhibit weaker memory and language capabilities, potentially leading to factual inaccuracies. Initial human intent alignment is preliminary, meaning outputs might be unpredictable, harmful, or misaligned with human preferences. Self-awareness and Chinese comprehension require further enhancement.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days