unstract  by Zipstack

No-code platform for structured document extraction via LLMs

created 1 year ago
5,513 stars

Top 9.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Unstract is a no-code platform designed for efficiently structuring unstructured documents using Large Language Models (LLMs). It empowers users to build and deploy APIs or ETL pipelines for data extraction, targeting developers and business users seeking to automate document processing workflows.

How It Works

Unstract utilizes a three-step "nirvana" process: users first engineer prompts in a dedicated "Prompt Studio" to extract desired fields from documents. This studio provides an integrated environment for testing prompts with various document samples, LLM outputs, and schema development tools. Subsequently, the configured Prompt Studio project can be deployed as a standalone API or integrated into an ETL pipeline with specified input and output sources. Finally, these workflows are deployed, enabling automated data structuring.

Quick Start & Requirements

  • Install: Clone the repository and run ./run-platform.sh.
  • Prerequisites: Linux or macOS (Intel/M-series), Docker, Docker Compose.
  • Access: Visit http://frontend.unstract.localhost with username unstract and password unstract.
  • Documentation: Quick Start Guide available for initial setup and prompt engineering.

Highlighted Details

  • Extensive ecosystem support for various LLM providers (OpenAI, Google VertexAI, Anthropic, etc.).
  • Integrates with multiple vector databases (Qdrant, Weaviate, Pinecone) and embedding models.
  • Supports numerous text extractors (Unstructured.io, LlamaIndex Parse) and ETL sources/destinations (AWS S3, GCS, Snowflake, BigQuery).
  • Offers a hosted version with a 14-day free trial.

Maintenance & Community

  • Community contributions are welcomed via CONTRIBUTING.md.
  • Active community presence on Slack and social media (X/Twitter, LinkedIn).

Licensing & Compatibility

  • The README does not explicitly state the license.

Limitations & Caveats

  • Backup of the ENCRYPTION_KEY is critical; its loss or change will render existing adapters inaccessible.
  • Usage analytics are integrated via Posthog but can be disabled.
Health Check
Last commit

19 hours ago

Responsiveness

1 day

Pull Requests (30d)
68
Issues (30d)
4
Star History
388 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.