YAYI  by wenge-research

Chinese LLM for domain-specific tasks, based on LLaMA 2 & BLOOM

created 2 years ago
3,172 stars

Top 15.5% on sourcepulse

GitHubView on GitHub
Project Summary

雅意大模型 (YAYI) 是一个专注于中文领域指令微调的大型语言模型系列,旨在为客户提供安全可靠的专属大模型解决方案。该项目基于 LLaMA 2 和 BLOOM 系列模型,通过在中英文多领域指令数据上进行微调,增强了模型在媒体宣传、舆情分析、公共安全、金融风控、城市治理等领域的中文基础能力和分析能力。

How It Works

YAYI 模型通过在百万级人工构造的高质量领域指令数据上进行微调,覆盖了上百种自然语言指令任务。其核心优势在于逐步增强中文基础能力和领域分析能力,并集成了多轮对话和部分插件能力。通过数百名用户的内测反馈优化,模型性能和安全性得到进一步提升。

Quick Start & Requirements

  • Install: Clone repository, create a conda environment (conda create --name yayi python=3.8), activate it (conda activate yayi), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.8, PyTorch, Transformers.
  • Inference: Requires ~20GB VRAM for FP16 inference on a single A100/A800/3090 GPU.
  • Resources: Model weights are available on Hugging Face.
  • Docs: README, Hugging Face Repo

Highlighted Details

  • Offers 7B and 13B parameter models based on LLaMA 2.
  • Supports full parameter fine-tuning and LoRA fine-tuning for both instruction and multi-turn conversation data.
  • Utilizes DeepSpeed for distributed training.
  • Released 50k training data samples covering finance, security, public opinion, and media.

Maintenance & Community

  • Project actively updated with new model weights and training code.
  • Mentions use of BigScience bloomz-7b1-mt and Meta Llama 2 weights.
  • Training code references Databricks dolly and Huggingface transformers.
  • Distributed training uses Microsoft DeepSpeed.

Licensing & Compatibility

  • Code License: Apache-2.0
  • Data License: CC BY-NC 4.0
  • Model License: YAYI (specific terms not detailed in README, but implied restrictions)
  • Restrictions: Explicitly stated for research purposes only, not for commercial use or any use that could cause societal harm.

Limitations & Caveats

The SFT models may produce factually incorrect answers, fail to identify harmful instructions, and have limitations in logical reasoning, code generation, and scientific computation. The project explicitly prohibits commercial use and any use that could cause societal harm.

Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ying Sheng Ying Sheng(Author of SGLang), and
9 more.

alpaca-lora by tloen

0.0%
19k
LoRA fine-tuning for LLaMA
created 2 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), John Yang John Yang(Author of SWE-bench, SWE-agent), and
13 more.

stanford_alpaca by tatsu-lab

0.1%
30k
Instruction-following LLaMA model training and data generation
created 2 years ago
updated 1 year ago
Feedback? Help us improve.