Knowledge graph pipeline for text corpus analysis
Top 23.3% on sourcepulse
This project provides a Python-based solution for converting any text corpus into a knowledge graph, targeting researchers and developers interested in Graph Augmented Generation (GRAG) or knowledge graph-based QnA. It enables deeper text analysis and more profound conversational AI by representing entities and their relationships.
How It Works
The approach involves splitting text into chunks, extracting concepts (rather than just entities) using a local LLM (Mistral 7B OpenOrca), and inferring relationships based on co-occurrence within chunks. Edges represent text chunks where concepts appear together, with weights derived from multiple occurrences and concatenated relationships. The system also calculates node degrees and communities for visualization sizing and coloring.
Quick Start & Requirements
poetry install
or pip install -e .
.zephyr
as per instructions) installed locally.poetry run pytest
or pytest
.Highlighted Details
Maintenance & Community
The project is seeking contributions for backend improvements (embedding deduplication, concept normalization, filtering) and frontend development for interactive graph exploration.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is described as needing "a lot more work" and lists several suggested improvements, indicating it may be in an early or experimental stage. The lack of a specified license could pose a barrier to commercial adoption.
2 months ago
1 day