Docs2KG  by AI4WA

CLI tool for knowledge graph construction from documents

created 1 year ago
316 stars

Top 86.7% on sourcepulse

GitHubView on GitHub
Project Summary

Docs2KG offers a unified approach to constructing knowledge graphs from diverse document types, targeting researchers and developers who need to extract structured information from unstructured text. It leverages a human-LLM collaborative framework to improve the quality and efficiency of knowledge graph generation.

How It Works

Docs2KG employs a hybrid bottom-up and top-down strategy, integrating Large Language Models (LLMs) for knowledge graph and ontology construction. It categorizes knowledge into MetaKG (document metadata), LayoutKG (document structure), and SemanticKG (content entities and relations). A key feature is its human-LLM collaborative interface, enabling iterative refinement of the knowledge graph based on human feedback, which in turn enhances the LLM's performance.

Quick Start & Requirements

  • Installation: pip install Docs2KG and python -m spacy download en_core_web_sm.
  • Prerequisites: Python environment, spaCy English model. LLM access (e.g., via Ollama) is required for agent-based processing.
  • Usage: Set CONFIG_FILE environment variable. Commands include docs2kg process-document, docs2kg batch-process, docs2kg list-formats, and docs2kg neo4j.
  • Documentation: Detailed setup and tutorials are available in the official documentation.

Highlighted Details

  • Supports heterogeneous document formats: PDF, DOCX, HTML, EPUB.
  • Provides a human-LLM collaborative interface for iterative knowledge graph refinement.
  • Outputs knowledge graphs suitable for downstream applications like RAG.
  • Includes metrics for evaluating automatic construction quality.

Maintenance & Community

The project is associated with AI4WA. Further community or maintenance details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. It provides an arXiv citation, suggesting it is research-oriented. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is presented as a research contribution (arXiv:2406.02962), implying it may be in an early stage of development. Specific limitations regarding supported LLMs, scalability, or robustness in production environments are not detailed.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

KAG by OpenSPG

0.7%
8k
Logical reasoning framework for domain knowledge bases
created 10 months ago
updated 6 days ago
Feedback? Help us improve.