sycamore  by aryn-ai

LLM-powered platform for unstructured data search and analytics

Created 2 years ago
561 stars

Top 57.2% on SourcePulse

GitHubView on GitHub
Project Summary

Sycamore is an AI-powered platform for processing, analyzing, and enriching unstructured documents, targeting engineers and researchers building ETL pipelines, RAG systems, and LLM applications. It offers enhanced data chunking and recall for improved AI model performance on diverse document types.

How It Works

Sycamore utilizes Aryn DocParse, a GPU-powered API leveraging a DETR AI model trained on enterprise documents, for advanced document segmentation, OCR, and table extraction. This approach aims for superior data chunking accuracy and recall in hybrid search and RAG compared to other systems. The platform is built around a DocSet abstraction, enabling scalable, functional data transformations and reliable loading into various vector databases.

Quick Start & Requirements

Highlighted Details

  • Integrates Aryn DocParse with a vision AI model for semantic document structure preservation.
  • DocSet abstraction for scalable, functional document manipulation.
  • Supports high-quality table extraction, OCR, visual summarization, and LLM-powered UDFs.
  • Includes automatic data crawlers (S3, HTTP) and an OpenSearch RAG engine.
  • Scalable backend powered by Ray.

Maintenance & Community

Licensing & Compatibility

  • PyPI package sycamore-ai is released under the Apache 2.0 license.

Limitations & Caveats

  • Primarily designed for Linux and Mac OS; Windows support is not explicitly mentioned.
  • Relies on Aryn DocParse for advanced document parsing, which has a cloud API option and a local option.
Health Check
Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
29
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.