knowledge_graph by rahulnyk

Knowledge graph pipeline for text corpus analysis

Created 2 years ago

2,996 stars

Top 15.7% on SourcePulse

Project Summary

This project provides a Python-based solution for converting any text corpus into a knowledge graph, targeting researchers and developers interested in Graph Augmented Generation (GRAG) or knowledge graph-based QnA. It enables deeper text analysis and more profound conversational AI by representing entities and their relationships.

How It Works

The approach involves splitting text into chunks, extracting concepts (rather than just entities) using a local LLM (Mistral 7B OpenOrca), and inferring relationships based on co-occurrence within chunks. Edges represent text chunks where concepts appear together, with weights derived from multiple occurrences and concatenated relationships. The system also calculates node degrees and communities for visualization sizing and coloring.

Quick Start & Requirements

Install: Clone the repository and install dependencies using poetry install or pip install -e ..
Prerequisites: Python 3.11+, Poetry (recommended), and Ollama with the Mistral 7B OpenOrca model (or zephyr as per instructions) installed locally.
Verification: Run poetry run pytest or pytest.
Docs: Medium article explaining the method: https://medium.com/@rahulnyk/convert-any-corpus-of-text-into-a-graph-of-knowledge-93023333021e

Highlighted Details

Leverages Mistral 7B OpenOrca via Ollama for local, cost-free concept extraction.
Utilizes NetworkX for graph manipulation and Pyvis for interactive web-based visualizations.
Generates graph metrics like node degree and community structure.
Focuses on "concepts" over traditional entities for richer semantic representation.

Maintenance & Community

The project is seeking contributions for backend improvements (embedding deduplication, concept normalization, filtering) and frontend development for interactive graph exploration.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as needing "a lot more work" and lists several suggested improvements, indicating it may be in an early or experimental stage. The lack of a specified license could pose a barrier to commercial adoption.

knowledge_graph by rahulnyk

Explore Similar Projects

GraphRAG by Graph-RAG

llmgraph by dylanhogg

prettygraph by yoheinakajima

graph_maker by rahulnyk

G-Retriever by XiaoxinHe

itext2kg by AuvaLab

kg-gen by stair-lab

Awesome-Graph-LLM by XiaoxinHe

graph4nlp by graph4ai

knowledge-graph-llms by thu-vu92

ai-knowledge-graph by robert-mcdermott

GraphGPT by varunshenoy