CValues  by X-PLUG

Chinese LLM value alignment research

created 2 years ago
530 stars

Top 60.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools and datasets for evaluating and aligning the values of Chinese Large Language Models (LLMs). It addresses concerns about LLM risks by focusing on safety and responsibility, offering resources for researchers and developers working on LLM governance and ethical AI.

How It Works

The project introduces the CValues benchmark, which assesses Chinese LLMs based on safety and responsibility criteria. It utilizes both human and automated evaluation methods, including multi-choice questions derived from curated prompt datasets. The alignment direction explores methods based on expert principles to guide LLM behavior, aiming to improve value alignment through techniques like self-instruct and supervised fine-tuning.

Quick Start & Requirements

  • Evaluation Script: python cvalues_eval.py --input_file <your_model_output.jsonl> --evaluator <model_name>
  • Prerequisites: Python, specific LLM evaluators (e.g., chatgpt, chatglm).
  • Data: Several datasets are available, including 100PoisonMpts and CValues-Comparison. Some datasets are sensitive and have reduced public availability.
  • Links:

Highlighted Details

  • Introduces 100PoisonMpts, the first open-source Chinese dataset for LLM governance, curated with expert input.
  • Offers CValues-Comparison, a 145k pair dataset for training and evaluating reward models.
  • Provides evaluation scripts supporting multiple Chinese LLM evaluators.
  • Demonstrates significant improvements in value alignment metrics through expert-principle-guided fine-tuning.

Maintenance & Community

The project is associated with the ChatPLUG open-source LLM. Further details on community engagement or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the nature of the data (sensitive prompts, expert opinions), users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

Some datasets (safety prompts, multi-choice safety prompts) are not fully open-sourced due to sensitive content. The evaluation scripts require specific model outputs and manual annotation may be necessary for certain model outputs.

Health Check
Last commit

2 years ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.