雅意大模型 (YAYI) 是一个专注于中文领域指令微调的大型语言模型系列,旨在为客户提供安全可靠的专属大模型解决方案。该项目基于 LLaMA 2 和 BLOOM 系列模型,通过在中英文多领域指令数据上进行微调,增强了模型在媒体宣传、舆情分析、公共安全、金融风控、城市治理等领域的中文基础能力和分析能力。
How It Works
YAYI 模型通过在百万级人工构造的高质量领域指令数据上进行微调,覆盖了上百种自然语言指令任务。其核心优势在于逐步增强中文基础能力和领域分析能力,并集成了多轮对话和部分插件能力。通过数百名用户的内测反馈优化,模型性能和安全性得到进一步提升。
Quick Start & Requirements
- Install: Clone repository, create a conda environment (
conda create --name yayi python=3.8
), activate it (conda activate yayi
), and install dependencies (pip install -r requirements.txt
).
- Prerequisites: Python 3.8, PyTorch, Transformers.
- Inference: Requires ~20GB VRAM for FP16 inference on a single A100/A800/3090 GPU.
- Resources: Model weights are available on Hugging Face.
- Docs: README, Hugging Face Repo
Highlighted Details
- Offers 7B and 13B parameter models based on LLaMA 2.
- Supports full parameter fine-tuning and LoRA fine-tuning for both instruction and multi-turn conversation data.
- Utilizes DeepSpeed for distributed training.
- Released 50k training data samples covering finance, security, public opinion, and media.
Maintenance & Community
- Project actively updated with new model weights and training code.
- Mentions use of BigScience bloomz-7b1-mt and Meta Llama 2 weights.
- Training code references Databricks dolly and Huggingface transformers.
- Distributed training uses Microsoft DeepSpeed.
Licensing & Compatibility
- Code License: Apache-2.0
- Data License: CC BY-NC 4.0
- Model License: YAYI (specific terms not detailed in README, but implied restrictions)
- Restrictions: Explicitly stated for research purposes only, not for commercial use or any use that could cause societal harm.
Limitations & Caveats
The SFT models may produce factually incorrect answers, fail to identify harmful instructions, and have limitations in logical reasoning, code generation, and scientific computation. The project explicitly prohibits commercial use and any use that could cause societal harm.