CValues by X-PLUG

Chinese LLM value alignment research

Created 2 years ago

554 stars

Top 57.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides tools and datasets for evaluating and aligning the values of Chinese Large Language Models (LLMs). It addresses concerns about LLM risks by focusing on safety and responsibility, offering resources for researchers and developers working on LLM governance and ethical AI.

How It Works

The project introduces the CValues benchmark, which assesses Chinese LLMs based on safety and responsibility criteria. It utilizes both human and automated evaluation methods, including multi-choice questions derived from curated prompt datasets. The alignment direction explores methods based on expert principles to guide LLM behavior, aiming to improve value alignment through techniques like self-instruct and supervised fine-tuning.

Quick Start & Requirements

Evaluation Script: python cvalues_eval.py --input_file <your_model_output.jsonl> --evaluator <model_name>
Prerequisites: Python, specific LLM evaluators (e.g., chatgpt, chatglm).
Data: Several datasets are available, including 100PoisonMpts and CValues-Comparison. Some datasets are sensitive and have reduced public availability.
Links:
- CValues Paper: https://arxiv.org/abs/2307.09705
- 100PoisonMpts Dataset: ModelScope Link

Highlighted Details

Introduces 100PoisonMpts, the first open-source Chinese dataset for LLM governance, curated with expert input.
Offers CValues-Comparison, a 145k pair dataset for training and evaluating reward models.
Provides evaluation scripts supporting multiple Chinese LLM evaluators.
Demonstrates significant improvements in value alignment metrics through expert-principle-guided fine-tuning.

Maintenance & Community

The project is associated with the ChatPLUG open-source LLM. Further details on community engagement or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the nature of the data (sensitive prompts, expert opinions), users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some datasets (safety prompts, multi-choice safety prompts) are not fully open-sourced due to sensitive content. The evaluation scripts require specific model outputs and manual annotation may be necessary for certain model outputs.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days