unstract  by Zipstack

No-code platform for structured document extraction via LLMs

Created 1 year ago
5,779 stars

Top 8.9% on SourcePulse

GitHubView on GitHub
Project Summary

Unstract is a no-code platform designed for efficiently structuring unstructured documents using Large Language Models (LLMs). It empowers users to build and deploy APIs or ETL pipelines for data extraction, targeting developers and business users seeking to automate document processing workflows.

How It Works

Unstract utilizes a three-step "nirvana" process: users first engineer prompts in a dedicated "Prompt Studio" to extract desired fields from documents. This studio provides an integrated environment for testing prompts with various document samples, LLM outputs, and schema development tools. Subsequently, the configured Prompt Studio project can be deployed as a standalone API or integrated into an ETL pipeline with specified input and output sources. Finally, these workflows are deployed, enabling automated data structuring.

Quick Start & Requirements

  • Install: Clone the repository and run ./run-platform.sh.
  • Prerequisites: Linux or macOS (Intel/M-series), Docker, Docker Compose.
  • Access: Visit http://frontend.unstract.localhost with username unstract and password unstract.
  • Documentation: Quick Start Guide available for initial setup and prompt engineering.

Highlighted Details

  • Extensive ecosystem support for various LLM providers (OpenAI, Google VertexAI, Anthropic, etc.).
  • Integrates with multiple vector databases (Qdrant, Weaviate, Pinecone) and embedding models.
  • Supports numerous text extractors (Unstructured.io, LlamaIndex Parse) and ETL sources/destinations (AWS S3, GCS, Snowflake, BigQuery).
  • Offers a hosted version with a 14-day free trial.

Maintenance & Community

  • Community contributions are welcomed via CONTRIBUTING.md.
  • Active community presence on Slack and social media (X/Twitter, LinkedIn).

Licensing & Compatibility

  • The README does not explicitly state the license.

Limitations & Caveats

  • Backup of the ENCRYPTION_KEY is critical; its loss or change will render existing adapters inaccessible.
  • Usage analytics are integrated via Posthog but can be disabled.
Health Check
Last Commit

14 hours ago

Responsiveness

1 day

Pull Requests (30d)
62
Issues (30d)
3
Star History
118 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jerry Liu Jerry Liu(Cofounder of LlamaIndex), and
1 more.

sparrow by katanaml

0.1%
5k
Data processing & instruction calling tool using ML, LLM, and Vision LLM
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.