cite  by Open-Source-Legal

LLM workspace for unstructured document analytics

Created 3 years ago
1,338 stars

Top 29.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

OpenContracts is an open-source, API-first LLM workspace designed for enterprise document analytics. It empowers users to extract, redact, and manage data from unstructured documents, offering a prompt playground and annotation tools for deep document analysis.

How It Works

The platform utilizes a pluggable microservice architecture for document processing, supporting custom parsers, embedders, and thumbnail generators. It features a Django backend with pgvector for hybrid vector storage, enabling the combination of structured metadata and vector embeddings. LlamaIndex integration allows LLMs to query documents using annotated features and vector stores, facilitating intelligent question answering and bulk data extraction.

Quick Start & Requirements

Highlighted Details

  • API-first design for programmatic access and integration.
  • Pluggable architecture for easy extension of document formats and analysis tools.
  • Human annotation interface for manual document markup.
  • LlamaIndex wrapper for seamless integration with LLM-powered querying.

Maintenance & Community

  • Roadmap includes features like LlamaParse integration, benchmarking, enhanced extraction, streaming, and government data integration.
  • Acknowledges contributions from AllenAI's PAWLS and NLmatics nlm-ingestor.

Licensing & Compatibility

  • License: GPL-3.0.
  • Compatibility: GPL-3.0 may impose copyleft restrictions on derivative works, potentially impacting commercial or closed-source integrations.

Limitations & Caveats

Currently supports PDF and text-based formats (plaintext, MD). While planned, support for other office formats (e.g., DOCX, XLSX) will likely rely on external conversion tools to Markdown.

Health Check
Last Commit

14 hours ago

Responsiveness

1 day

Pull Requests (30d)
600
Issues (30d)
208
Star History
46 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
48 more.

llama_index by run-llama

0.3%
50k
Data framework for building LLM-powered agents
Created 3 years ago
Updated 1 day ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Dirk Englund Dirk Englund(MIT EECS Professor and Cofounder of Axiomatic AI), and
25 more.

firecrawl by firecrawl

1.8%
124k
API service for turning websites into LLM-ready data
Created 2 years ago
Updated 14 hours ago
Feedback? Help us improve.