z-bench  by zhenbench

Chinese LLM prompt dataset for non-technical users

created 2 years ago
496 stars

Top 63.4% on sourcepulse

GitHubView on GitHub
Project Summary

Z-Bench is a curated dataset of 300 Chinese prompts designed for non-technical users to qualitatively evaluate the conversational abilities of large language models (LLMs). Developed by Zhenfund, it aims to provide a practical, user-friendly alternative to complex academic benchmarks, focusing on real-world conversational AI performance.

How It Works

Z-Bench categorizes prompts into "Basic," "Advanced," and "Specialized" abilities, drawing from existing NLP benchmarks, user-collected examples, and observed emergent LLM capabilities. This approach prioritizes coverage of diverse Natural Language Processing tasks relevant to conversational AI, offering a more accessible evaluation method than automated, academically rigorous test suites.

Quick Start & Requirements

  • Dataset Access: Prompts are available in CSV format via Tencent Docs: https://docs.qq.com/sheet/DTEFsdkNERVVtR3BX
  • Requirements: No specific software or hardware prerequisites are mentioned beyond the ability to process CSV files and interact with LLMs.

Highlighted Details

  • 300 Chinese prompts covering basic, advanced, and specialized LLM capabilities.
  • Designed for qualitative, non-technical evaluation of conversational AI products.
  • Combines academic benchmarks, practical examples, and emergent LLM abilities.

Maintenance & Community

  • Developed by Zhenfund with contributions from several individuals.
  • The project aims for continuous improvement based on user feedback.

Licensing & Compatibility

  • The dataset is provided by Zhenfund and © 2023. Specific licensing terms are not detailed in the README, but its use appears intended for evaluation purposes.

Limitations & Caveats

The dataset is intended for qualitative assessment and may not be suitable for rigorous academic benchmarking. The creators acknowledge potential omissions and amateur content from a professional NLP perspective, with plans for future updates based on feedback.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.