Epstein-doc-explorer  by maxandrews

AI-powered graph explorer for legal documents

Created 1 month ago
272 stars

Top 94.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project addresses the challenge of understanding complex legal document networks by providing an intelligent document analysis and network visualization system. It targets researchers, legal professionals, and power users, enabling them to extract, explore, and visualize relationships, entities, and events from the Epstein document corpus through an interactive knowledge graph.

How It Works

The system employs a two-phase architecture: an Analysis Pipeline using TypeScript and Claude AI for entity/relationship extraction, semantic tagging, tag clustering (K-means), and LLM-based entity deduplication; and a Visualization Interface built with React and D3.js for an interactive, force-directed network graph. This approach leverages advanced AI for deep document understanding and sophisticated graph algorithms for efficient, explorable data presentation.

Quick Start & Requirements

  • Install/Run: Clone the repository. Install Node.js dependencies (npm install). Run the API server (npx tsx api_server.ts) and the frontend (cd network-ui && npm run dev) in separate terminals.
  • Prerequisites: Node.js environment. Access to Claude AI is required for running the analysis pipeline locally. SQLite database is used.
  • Links:

Highlighted Details

  • AI-Powered Extraction: Utilizes Claude AI for sophisticated entity, relationship, and event extraction from legal documents.
  • Semantic Tag Clustering: Groups over 28,000 tags into 30 semantic clusters via K-means for enhanced filtering.
  • Entity Deduplication: Employs LLM-based similarity detection to merge duplicate entity mentions, ensuring data consistency.
  • Interactive Knowledge Graph: Features a force-directed graph with actor-centric views, density-based pruning, and dynamic filtering by category and hop distance.
  • Performance Optimizations: Includes materialized cluster IDs, indexed database columns, edge deduplication, and adaptive limits for smooth visualization.

Maintenance & Community

The project is maintained by maxandrews, with contributions acknowledged from tensonaut. Community support is primarily through GitHub issues. No specific roadmap or external community channels (like Discord/Slack) are detailed.

Licensing & Compatibility

The project is released under the MIT License, which permits broad usage, including commercial applications and integration into closed-source projects.

Limitations & Caveats

The dataset is actively being processed, with new documents continuously added to the network. The analysis pipeline is ongoing, indicating a dynamic and potentially evolving corpus. Local execution of the analysis pipeline requires access to the Claude AI API.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.