knowledge_graph_maker by rahulnyk

Python library for text-to-knowledge graph conversion

Created 2 years ago

259 stars

Top 97.7% on SourcePulse

Project Summary

A Python library designed to transform arbitrary text into structured knowledge graphs, leveraging user-defined ontologies and Large Language Models (LLMs). It targets researchers, developers, and power users seeking to analyze textual data, uncover hidden relationships, and enable advanced applications like Graph Retrieval Augmented Generation (GRAG) for more profound document interaction. The primary benefit is enhanced text comprehension and the creation of queryable, interconnected knowledge structures.

How It Works

The core approach involves defining a graph ontology using Pydantic models for entity labels and relationships. Input text is segmented into manageable chunks (800-1200 tokens recommended) to accommodate LLM context windows. Each chunk is processed by a selected LLM client (OpenAI or Groq) via a GraphMaker instance, which uses tuned prompts to extract entities and relationships conforming to the ontology. The library includes robust error handling for LLM responses, automatically correcting JSON parsing failures. The output is a list of Edge objects representing the knowledge graph, which can optionally be persisted to Neo4j for further analysis, visualization, or RAG implementation.

Quick Start & Requirements

Installation: pip install knowledge-graph-maker
Prerequisites:
- API keys for LLM providers: GROQ_API_KEY or OPENAI_API_KEY.
- Optional Neo4j credentials: NEO4J_USERNAME, NEO4J_PASSWORD, NEO4J_URI.
- Python environment.
Setup: Requires setting environment variables for chosen LLM and database services.
Links: No explicit links to official quick-start guides or demos were found in the provided text.

Highlighted Details

Supports both OpenAI and Groq LLM clients out-of-the-box, with options for custom LLM implementations.
Includes fault-tolerant parsing for LLM-generated graph data, with automatic JSON correction and manual splitting strategies.
Optional integration with Neo4j allows for graph database storage, enabling network algorithms, visualization (e.g., Neo4j Bloom), and RAG applications.
Ontology definition uses Pydantic models, providing a structured way to guide entity and relationship extraction.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), sponsorships, or roadmap were provided in the README excerpt.

Licensing & Compatibility

The license type is not explicitly stated in the provided README content. This omission requires further investigation before commercial use or integration into closed-source projects.

Limitations & Caveats

The accuracy of the generated knowledge graph is contingent upon the chosen LLM, the defined ontology, and the quality of the input text. LLM context window limitations necessitate text chunking, potentially leading to fragmented graph segments if not managed carefully. API rate limits for LLM services may require implementing delays between processing chunks.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days