LLM-based-causal-discovery  by WXY604

LLM-powered causal discovery toolkit

Created 3 months ago
561 stars

Top 57.2% on SourcePulse

GitHubView on GitHub
Project Summary

This toolkit addresses the challenge of inferring causal relationships from observational data by integrating Large Language Models (LLMs) to generate prior knowledge, thereby reducing reliance on costly domain expertise. It is designed for data scientists and researchers seeking more robust and cost-effective causal discovery.

How It Works

The toolkit employs LLMs to elicit structured knowledge, specifically focusing on the temporal ordering of variables, which is found to be more stable than direct causal judgments. It then integrates and refines these potentially inconsistent LLM outputs into a globally consistent variable ordering. This refined ordering serves as a prior to guide mainstream causal discovery algorithms, enhancing accuracy and reliability.

Quick Start & Requirements

  • End-Users (Recommended): Download the latest pre-built Linux executable from GitHub Releases. Requires graphviz and unrar.
    • Install prerequisites: sudo apt-get update && sudo apt-get install graphviz unrar
    • Download and extract .rar archive.
    • Navigate to extracted directory, chmod +x CD, then run ./CD.
  • Developers: Clone the repository, set up a Python virtual environment, and install dependencies via pip install -r requirements.txt.
    • Run causal discovery: python tools/causal_discovery/main.py
    • LLM-assisted discovery requires generating an LLM knowledge matrix.

Highlighted Details

  • Two LLM prompting strategies are detailed: one focusing on temporal order, another using dual-expert reasoning (Conservative and Exploratory) to generate "Harmonized Priors" with path existence and edge absence constraints.
  • A three-stage prompting process (Variable Understanding, Causal Discovery, Error Revision) is also presented for extracting and refining causal knowledge.
  • The toolkit supports integrating LLM-derived priors as hard or soft constraints into score-based causal structure learning algorithms.
  • Includes synthetic datasets and scripts for generating prior matrices from ground truth or LLM knowledge.

Maintenance & Community

No specific community links (Discord/Slack) or notable contributors are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of a requirements.txt suggests Python dependencies, and the executable distribution implies potential use in various environments.

Limitations & Caveats

The README notes that not all algorithm parameters may be fully functional in the current development phase. The effectiveness of LLM-generated priors is dependent on the LLM's output quality and the chosen integration strategy.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
41 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.