Lawyer LLaMA is an open-source project providing a large language model fine-tuned for the Chinese legal domain. It aims to enhance LLMs with Chinese legal knowledge and practical application capabilities, benefiting researchers and developers working with legal AI.
How It Works
Lawyer LLaMA is built upon LLaMA models, first undergoing continual pretraining on a large corpus of Chinese legal texts to systematically learn the legal knowledge system. Subsequently, it's instruction fine-tuned using curated datasets derived from ChatGPT, focusing on objective questions from the Chinese National Unified Legal Professional Qualification Examination and legal consultations. This two-stage approach equips the model with both foundational legal knowledge and the ability to apply it to specific scenarios.
Quick Start & Requirements
- Installation: Model parameters are available for download. Inference can be run using provided scripts (e.g.,
demo/run_inference_v2.md
).
- Prerequisites: Requires LLaMA-2 base model (for v2) or Chinese-LLaMA-13B (for v1). Specific Python dependencies are detailed in the repository.
- Resources: Model weights are substantial (13B parameters). Running inference may require significant GPU memory.
- Links: Lawyer LLaMA Technical Report, Demo/Inference Guide.
Highlighted Details
- Offers two versions:
lawyer-llama-13b-v2
(based on LLaMA-2 with continued Chinese pretraining) and lawyer-llama-13b-beta1.0
(based on Chinese-LLaMA-13B).
- Provides open-sourced instruction fine-tuning datasets, including GPT-4 generated legal exam answers and legal advice.
- Includes a marriage-related legal retrieval module.
- Evaluated against other models like GPT-3.5-Turbo and Gemini-1.0-Pro, showing competitive performance in legal consultation quality.
Maintenance & Community
- The project is actively maintained, with recent updates including
lawyer-llama-13b-v2
and improved datasets.
- Key contributors are listed, with guidance from Professor Yansong Feng.
- Community contributions for deployment and quantization are documented.
Licensing & Compatibility
- The project content is for academic research only and explicitly prohibits commercial or harmful uses.
- Users must adhere to the open-source licenses of any third-party code used.
Limitations & Caveats
- The project is intended for academic research and not for commercial use or providing professional legal advice.
- Data generated by ChatGPT has not been rigorously validated and may contain errors. Model outputs are not professional legal consultations.