graphify by Graphify-Labs

AI knowledge graph for code and multimodal data

Created 3 months ago

79,407 stars

Top 0.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Kevin Hou

Head of Product Engineering at Windsurf

Shyamal Anadkat

Research Scientist at OpenAI

Project Summary

Summary

Graphify is an AI coding assistant skill designed to transform any collection of code, documentation, papers, or images into a queryable knowledge graph. It helps users understand complex codebases faster, uncover architectural rationale, and navigate vast amounts of information efficiently. The primary benefit is a significant reduction in token usage for queries compared to processing raw files, making AI-assisted code analysis more cost-effective and persistent across sessions.

How It Works

Graphify employs a two-pass process. Initially, a deterministic Abstract Syntax Tree (AST) pass parses code files to extract structural elements like classes, functions, imports, and call graphs without LLM involvement. Subsequently, Claude sub-agents process documents, papers, and images to extract concepts and relationships. These results are merged into a NetworkX graph, clustered using Leiden community detection based on graph topology (edge density), and exported. Novelty lies in using graph structure itself as the similarity signal, eliminating the need for separate embedding steps or vector databases, and tagging relationships as EXTRACTED, INFERRED (with confidence scores), or AMBIGUOUS.

Quick Start & Requirements

Requires Python 3.10+. Installation is typically pip install graphifyy (the PyPI package name is temporarily graphifyy while graphify is reclaimed) followed by graphify install. Platform-specific installation commands exist for Claude Code, Codex, OpenCode, OpenClaw, and Factory Droid. Codex users need multi_agent = true in their config. Running /graphify within a project folder initiates the process. An AI coding assistant (Claude Code, Codex, etc.) is essential for semantic extraction.

Highlighted Details

Multimodal Input: Processes code, PDFs, Markdown, screenshots, diagrams, and images in any language via Claude vision.
Token Efficiency: Claims up to 71.5x fewer tokens per query compared to reading raw files, with savings compounding across sessions due to caching.
Persistent Knowledge: Generates graph.json for querying weeks later without re-reading source files.
Automated Integration: Supports optional "always-on" integration with AI assistants via hooks or AGENTS.md files, and Git hooks for automatic graph rebuilding on commit/checkout.
Rich Outputs: Generates interactive HTML graphs, plain-language audit reports (GRAPH_REPORT.md), queryable JSON, and supports exports to SVG, GraphML, and Neo4j.

Maintenance & Community

The project encourages contributions through "Worked examples" and reporting extraction bugs. It references ARCHITECTURE.md for details on adding language support. No specific community channels (like Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The README does not explicitly state the software license. This omission requires clarification for assessing commercial use or closed-source linking compatibility.

Limitations & Caveats

Relies on external AI model APIs (Anthropic, OpenAI) for semantic extraction from non-code files, incurring potential costs and requiring API keys. Code processing is local via AST. OpenClaw platform has early-stage, sequential agent support. The PyPI package name graphifyy differs from the CLI command graphify.

Health Check

Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)

256

Issues (30d)

239

Star History

16,510 stars in the last 30 days