shiji-kb by baojie

AI-driven knowledge graph construction from ancient texts

Created 1 year ago

2,286 stars

Top 19.1% on SourcePulse

Project Summary

Summary

This project transforms classical Chinese texts, starting with the Shiji, into a structured knowledge graph using AI agents and a novel "Agentic Ontology." It targets researchers and power users, enabling the discovery of deep historical patterns, contradictions, and logical inferences inaccessible through traditional analysis, making ancient literature more navigable and actionable.

How It Works

The core is "Agentic Ontology," an AI-driven, bottom-up extraction paradigm replacing expert-designed knowledge structures. Methodologies are codified in "SKILL" documents executed by AI agents via a multi-stage pipeline. This approach prioritizes rapid, iterative refinement and emergent knowledge, shifting the focus from initial design to AI-extracted ontology evolution.

Quick Start & Requirements

Clone the repo (git clone https://github.com/baojie/shiji-kb.git), install dependencies (pip install -r requirements.txt), and run HTML generation scripts (render_shiji_html.py, generate_all_chapters.py). An online demo is available at https://baojie.github.io/shiji-kb. Extensive methodology documentation is provided in two PDF合集 (425 pages Meta-Skills, 438 pages Pipeline Skills).

Highlighted Details

Processes 577,000 characters of the Shiji into 15,190 entities, 3,197 events, and 7,637 relationships.
Uncovers over 20 cross-chapter insights, including historical contradictions and logical paradoxes.
Features an interactive "Shiji Metro Map" with 130 timelines and 3,197 event nodes.
Employs a structured reflection loop for AI-driven quality improvement, achieving high accuracy through iterative refinement.

Maintenance & Community

Led by Baojie with AI collaboration from Anthropic's Claude models. Contributions are welcomed via GitHub Issues for data refinement, code improvements, and discussion.

Licensing & Compatibility

Annotated data is CC BY-NC-SA 4.0 (non-commercial). Analysis scripts are MIT licensed (permissive). The original Shiji text is public domain.

Limitations & Caveats

AI-generated annotations contain errors due to text ambiguity, evolving standards, and complex reasoning. Known issues include fuzzy entity boundaries, inconsistent event granularity, potential over-inference, and ~1.3% of events lacking specific Gregorian years. The project addresses this via a structured, iterative AI-driven correction process, prioritizing speed and continuous improvement over initial perfection.

Health Check

Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

257 stars in the last 30 days