LLM-based-causal-discovery by WXY604

LLM-powered causal discovery toolkit

Created 7 months ago

839 stars

Top 42.5% on SourcePulse

Project Summary

This toolkit addresses the challenge of inferring causal relationships from observational data by integrating Large Language Models (LLMs) to generate prior knowledge, thereby reducing reliance on costly domain expertise. It is designed for data scientists and researchers seeking more robust and cost-effective causal discovery.

How It Works

The toolkit employs LLMs to elicit structured knowledge, specifically focusing on the temporal ordering of variables, which is found to be more stable than direct causal judgments. It then integrates and refines these potentially inconsistent LLM outputs into a globally consistent variable ordering. This refined ordering serves as a prior to guide mainstream causal discovery algorithms, enhancing accuracy and reliability.

Quick Start & Requirements

End-Users (Recommended): Download the latest pre-built Linux executable from GitHub Releases. Requires graphviz and unrar.
- Install prerequisites: sudo apt-get update && sudo apt-get install graphviz unrar
- Download and extract .rar archive.
- Navigate to extracted directory, chmod +x CD, then run ./CD.
Developers: Clone the repository, set up a Python virtual environment, and install dependencies via pip install -r requirements.txt.
- Run causal discovery: python tools/causal_discovery/main.py
- LLM-assisted discovery requires generating an LLM knowledge matrix.

Highlighted Details

Two LLM prompting strategies are detailed: one focusing on temporal order, another using dual-expert reasoning (Conservative and Exploratory) to generate "Harmonized Priors" with path existence and edge absence constraints.
A three-stage prompting process (Variable Understanding, Causal Discovery, Error Revision) is also presented for extracting and refining causal knowledge.
The toolkit supports integrating LLM-derived priors as hard or soft constraints into score-based causal structure learning algorithms.
Includes synthetic datasets and scripts for generating prior matrices from ground truth or LLM knowledge.

Maintenance & Community

No specific community links (Discord/Slack) or notable contributors are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of a requirements.txt suggests Python dependencies, and the executable distribution implies potential use in various environments.

Limitations & Caveats

The README notes that not all algorithm parameters may be fully functional in the current development phase. The effectiveness of LLM-generated priors is dependent on the LLM's output quality and the chosen integration strategy.

LLM-based-causal-discovery by WXY604

Explore Similar Projects

llm-strategy by BlackHC

pywhyllm by py-why

ReasonFlux by Gen-Verse

awesome-local-llms by vince-lam

arXausality by logangraham

mLLMCelltype by cafferychen777

CausalNLP_Papers by zhijing-jin

HuatuoGPT-o1 by FreedomIntelligence

panda_factor by PandaAI-Tech

Awesome-LLM-Strawberry by hijkzzz

ai-engineering-toolkit by Sumanth077

tensorzero by tensorzero