itext2kg  by AuvaLab

Python package for incremental knowledge graph construction using LLMs

created 1 year ago
779 stars

Top 45.7% on sourcepulse

GitHubView on GitHub
Project Summary

iText2KG is a Python package for incrementally constructing knowledge graphs from text documents using large language models. It targets researchers and developers needing to extract and structure information, offering zero-shot entity and relation extraction, entity disambiguation, and integration with Neo4j for visualization. The primary benefit is automated, consistent knowledge graph creation from diverse text sources.

How It Works

iText2KG employs a modular architecture: a Document Distiller reformulates text into semantic blocks based on user-defined schemas, improving signal-to-noise. An Incremental Entity Extractor identifies and resolves unique entities using cosine similarity for disambiguation. An Incremental Relation Extractor identifies relationships between entities. Finally, a Graph Integrator populates a Neo4j database, enabling visualization. This approach leverages LLMs for extraction and LangChain for model compatibility, with recent updates focusing on mitigating LLM hallucinations and enhancing entity embedding with configurable weights for name and label.

Quick Start & Requirements

  • Install via pip: pip install itext2kg
  • Requires Python 3.9+.
  • Compatible with all LangChain-supported chat and embedding models (e.g., Mistral, OpenAI).
  • Requires Neo4j for graph visualization.
  • Example usage with Mistral or OpenAI models is provided.

Highlighted Details

  • Zero-shot entity and relation extraction across domains.
  • Incremental KG construction and updates.
  • Entity disambiguation using embedded names and labels (configurable weights).
  • Mitigation strategies for LLM hallucination (entity replacement, re-prompting).
  • Integration with Neo4j for visualization.

Maintenance & Community

  • Accepted at WISE 2024.
  • Open to community contributions.
  • Citation provided for the associated arXiv preprint.

Licensing & Compatibility

  • No explicit license mentioned in the README.
  • Compatible with commercial LLM APIs (OpenAI, Mistral) and open-source models via LangChain.

Limitations & Caveats

  • The README does not specify a license, which may impact commercial use.
  • While designed to mitigate hallucinations, LLM-generated content inherently carries a risk of inaccuracies.
  • Performance and accuracy are dependent on the chosen LLM and embedding models.
Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
10
Star History
83 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.