Chinese LLM value alignment research
Top 60.5% on sourcepulse
This repository provides tools and datasets for evaluating and aligning the values of Chinese Large Language Models (LLMs). It addresses concerns about LLM risks by focusing on safety and responsibility, offering resources for researchers and developers working on LLM governance and ethical AI.
How It Works
The project introduces the CValues benchmark, which assesses Chinese LLMs based on safety and responsibility criteria. It utilizes both human and automated evaluation methods, including multi-choice questions derived from curated prompt datasets. The alignment direction explores methods based on expert principles to guide LLM behavior, aiming to improve value alignment through techniques like self-instruct and supervised fine-tuning.
Quick Start & Requirements
python cvalues_eval.py --input_file <your_model_output.jsonl> --evaluator <model_name>
100PoisonMpts
and CValues-Comparison
. Some datasets are sensitive and have reduced public availability.Highlighted Details
100PoisonMpts
, the first open-source Chinese dataset for LLM governance, curated with expert input.CValues-Comparison
, a 145k pair dataset for training and evaluating reward models.Maintenance & Community
The project is associated with the ChatPLUG open-source LLM. Further details on community engagement or roadmaps are not explicitly provided in the README.
Licensing & Compatibility
The README does not explicitly state a license. Given the nature of the data (sensitive prompts, expert opinions), users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
Some datasets (safety prompts
, multi-choice safety prompts
) are not fully open-sourced due to sensitive content. The evaluation scripts require specific model outputs and manual annotation may be necessary for certain model outputs.
2 years ago
1 week