ruozhiba by Leymore

Dataset for LLM entertainment using Zhi Zhang posts

Created 3 years ago

752 stars

Top 45.8% on SourcePulse

Project Summary

This repository provides a dataset of posts from the Baidu "Ruozhiba" (Weak-minded Bar) forum, intended to inspire creative and entertaining uses of Large Language Models (LLMs) like ChatGPT. It is primarily for researchers and developers exploring novel LLM applications.

How It Works

The project curates and organizes posts from the Ruozhiba forum, categorizing them by quality and type (full posts or titles). This structured data serves as a unique corpus for training or fine-tuning LLMs, enabling them to generate humorous, nonsensical, or creatively "weak-minded" text, thereby exploring the boundaries of LLM creativity and safety.

Highlighted Details

Dataset includes 1.3k annual best posts (18-21), 2.6k recommended titles (up to 2023.04.30), and 81.7k general titles (up to 2023.04.30).
A separate collection of 2.4k question-type posts is available via a Tencent Docs link.
The data is intended to spark ideas for entertaining LLM usage.

Maintenance & Community

The project acknowledges the administrators and members of the Ruozhiba forum for their content contributions. No specific community channels or active maintenance indicators are provided.

Licensing & Compatibility

The repository does not specify a license. The data is sourced from a public forum, but its use for commercial purposes or integration into closed-source projects may require further investigation into the forum's terms of service and copyright.

Limitations & Caveats

The dataset is specific to the "Ruozhiba" forum's unique content style and may not generalize well to other domains. The lack of a specified license poses potential legal and compatibility issues for downstream use.

ruozhiba by Leymore

Explore Similar Projects

Better-Ruozhiba by FunnySaltyFish

upvote-rss by johnwarne

CHRONOS by Alibaba-NLP

XunziALLM by Xunzi-LLM-of-Chinese-classics

Chat_with_Datawhale_langchain by logan-zou

hacker-news-digest by polyrabbit

auto-news by finaldie

NLPDataSet by liucongg

history_rag by wxywb

chatgpt-corpus by PlexPt

Horizon by Thysrael

OpenChatKit by togethercomputer