OpenContracts  by Open-Source-Legal

LLM workspace for unstructured document analytics

Created 2 years ago
925 stars

Top 39.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OpenContracts is an open-source, API-first LLM workspace designed for enterprise document analytics. It empowers users to extract, redact, and manage data from unstructured documents, offering a prompt playground and annotation tools for deep document analysis.

How It Works

The platform utilizes a pluggable microservice architecture for document processing, supporting custom parsers, embedders, and thumbnail generators. It features a Django backend with pgvector for hybrid vector storage, enabling the combination of structured metadata and vector embeddings. LlamaIndex integration allows LLMs to query documents using annotated features and vector stores, facilitating intelligent question answering and bulk data extraction.

Quick Start & Requirements

Highlighted Details

  • API-first design for programmatic access and integration.
  • Pluggable architecture for easy extension of document formats and analysis tools.
  • Human annotation interface for manual document markup.
  • LlamaIndex wrapper for seamless integration with LLM-powered querying.

Maintenance & Community

  • Roadmap includes features like LlamaParse integration, benchmarking, enhanced extraction, streaming, and government data integration.
  • Acknowledges contributions from AllenAI's PAWLS and NLmatics nlm-ingestor.

Licensing & Compatibility

  • License: GPL-3.0.
  • Compatibility: GPL-3.0 may impose copyleft restrictions on derivative works, potentially impacting commercial or closed-source integrations.

Limitations & Caveats

Currently supports PDF and text-based formats (plaintext, MD). While planned, support for other office formats (e.g., DOCX, XLSX) will likely rely on external conversion tools to Markdown.

Health Check
Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
38
Issues (30d)
18
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
426
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Anton Troynikov Anton Troynikov(Cofounder of Chroma), and
44 more.

llama_index by run-llama

0.3%
44k
Data framework for building LLM-powered agents
Created 2 years ago
Updated 18 hours ago
Feedback? Help us improve.