AutoSchemaKG  by HKUST-KnowComp

Framework for autonomous knowledge graph construction

Created 4 months ago
490 stars

Top 63.0% on SourcePulse

GitHubView on GitHub
Project Summary

AutoSchemaKG is a framework for automated knowledge graph (KG) construction from unstructured text, designed for researchers and developers needing to build KGs without predefined schemas. It addresses the challenges of KG creation by combining LLM-based triple extraction with schema induction, enabling zero-shot inferencing and achieving state-of-the-art performance on benchmarks.

How It Works

AutoSchemaKG employs a two-stage approach: first, it extracts entities and events as triples from text using Large Language Models (LLMs). Second, it induces a schema through conceptualization, creating semantic links between disparate information. This method allows for autonomous KG construction and generalization across domains.

Quick Start & Requirements

  • Install via pip: pip install atlas-rag
  • For NV-embed-v2 support: pip install atlas-rag[nvembed]
  • Requires Python and potentially specific versions of transformers (>=4.42.4, <=4.47.1).
  • PDF processing requires a separate environment with marker-pdf and google-genai.
  • See example notebooks for detailed usage: atlas_billion_kg_usage.ipynb, atlas_full_pipeline.ipynb, atlas_multihopqa.ipynb.

Highlighted Details

  • Implements the ATLAS family of KGs (ATLAS-Wiki, ATLAS-Pes2o, ATLAS-CC) with over 900M nodes and 5.9B edges.
  • Supports Retrieval Augmented Generation (RAG) over constructed KGs.
  • Includes modules for KG quality, factual consistency, and general task performance evaluation.
  • Offers PDF-to-Markdown conversion for KG construction.

Maintenance & Community

  • Project is actively updated, with recent changes including batch generation and PDF support.
  • Contact information for key contributors is provided.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • PDF processing requires setting up a separate Conda environment due to dependency versioning.
  • The framework relies heavily on LLMs, which may introduce costs and potential biases.
Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
3
Star History
37 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.