sycamore by aryn-ai

LLM-powered platform for unstructured data search and analytics

Created 3 years ago

601 stars

Top 53.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Elie Bursztein

Cybersecurity Lead at Google DeepMind

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

Sycamore is an AI-powered platform for processing, analyzing, and enriching unstructured documents, targeting engineers and researchers building ETL pipelines, RAG systems, and LLM applications. It offers enhanced data chunking and recall for improved AI model performance on diverse document types.

How It Works

Sycamore utilizes Aryn DocParse, a GPU-powered API leveraging a DETR AI model trained on enterprise documents, for advanced document segmentation, OCR, and table extraction. This approach aims for superior data chunking accuracy and recall in hybrid search and RAG compared to other systems. The platform is built around a DocSet abstraction, enabling scalable, functional data transformations and reliable loading into various vector databases.

Quick Start & Requirements

Install via pip: pip install sycamore-ai
Install vector database connectors with extras, e.g., pip install sycamore-ai[duckdb]
Requires Linux or Mac OS.
Aryn DocParse API key needed for cloud processing.
Documentation: https://sycamore.readthedocs.io
Example notebook: https://github.com/aryn-ai/sycamore/blob/main/notebooks/sycamore-tutorial-intermediate-etl.ipynb

Highlighted Details

Integrates Aryn DocParse with a vision AI model for semantic document structure preservation.
DocSet abstraction for scalable, functional document manipulation.
Supports high-quality table extraction, OCR, visual summarization, and LLM-powered UDFs.
Includes automatic data crawlers (S3, HTTP) and an OpenSearch RAG engine.
Scalable backend powered by Ray.

Maintenance & Community

Active development with a Slack community for support and discussion: https://join.slack.com/t/sycamore-ulj8912/shared_invite/zt-23sv0yhgy-MywV5dkVQ~F98Aoejo48Jg

Licensing & Compatibility

PyPI package sycamore-ai is released under the Apache 2.0 license.

Limitations & Caveats

Primarily designed for Linux and Mac OS; Windows support is not explicitly mentioned.
Relies on Aryn DocParse for advanced document parsing, which has a cloud API option and a local option.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days