Discover and explore top open-source AI tools and projects—updated daily.
baojieAI-driven knowledge graph construction from ancient texts
Top 38.4% on SourcePulse
Summary
This project transforms classical Chinese texts, starting with the Shiji, into a structured knowledge graph using AI agents and a novel "Agentic Ontology." It targets researchers and power users, enabling the discovery of deep historical patterns, contradictions, and logical inferences inaccessible through traditional analysis, making ancient literature more navigable and actionable.
How It Works
The core is "Agentic Ontology," an AI-driven, bottom-up extraction paradigm replacing expert-designed knowledge structures. Methodologies are codified in "SKILL" documents executed by AI agents via a multi-stage pipeline. This approach prioritizes rapid, iterative refinement and emergent knowledge, shifting the focus from initial design to AI-extracted ontology evolution.
Quick Start & Requirements
Clone the repo (git clone https://github.com/baojie/shiji-kb.git), install dependencies (pip install -r requirements.txt), and run HTML generation scripts (render_shiji_html.py, generate_all_chapters.py). An online demo is available at https://baojie.github.io/shiji-kb. Extensive methodology documentation is provided in two PDF合集 (425 pages Meta-Skills, 438 pages Pipeline Skills).
Highlighted Details
Maintenance & Community
Led by Baojie with AI collaboration from Anthropic's Claude models. Contributions are welcomed via GitHub Issues for data refinement, code improvements, and discussion.
Licensing & Compatibility
Annotated data is CC BY-NC-SA 4.0 (non-commercial). Analysis scripts are MIT licensed (permissive). The original Shiji text is public domain.
Limitations & Caveats
AI-generated annotations contain errors due to text ambiguity, evolving standards, and complex reasoning. Known issues include fuzzy entity boundaries, inconsistent event granularity, potential over-inference, and ~1.3% of events lacking specific Gregorian years. The project addresses this via a structured, iterative AI-driven correction process, prioritizing speed and continuous improvement over initial perfection.
6 days ago
Inactive
1st1