sparrow by katanaml

Data processing & instruction calling tool using ML, LLM, and Vision LLM

Created 4 years ago

5,086 stars

Top 9.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jerry Liu

Cofounder of LlamaIndex

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

Sparrow is an open-source framework for universal document processing, offering data extraction, instruction calling, and workflow orchestration powered by ML, LLMs, and Vision LLMs. It targets developers and power users needing to automate the extraction and processing of structured data from diverse document types like invoices, receipts, and tables, providing a flexible, API-first solution.

How It Works

Sparrow employs a pluggable architecture with distinct pipelines: Sparrow Parse for Vision LLM-based document extraction, Sparrow Instructor for text-based instruction processing, and Sparrow Agents for orchestrating complex multi-step workflows. It supports multiple backends including MLX (Apple Silicon), Ollama, vLLM, PyTorch, and Hugging Face Cloud GPUs, enabling flexible deployment and hardware optimization. The system extracts data into structured JSON format, with optional schema validation and bounding box annotations.

Quick Start & Requirements

Install: Clone the repository, set up Python 3.10.4+ via pyenv, create virtual environments, and install pipeline-specific requirements (e.g., pip install -r requirements_sparrow_parse.txt).
Prerequisites: macOS or Linux/Windows. poppler for PDF processing (e.g., brew install poppler on macOS). GPU recommended for performance.
Demo: Try Sparrow Online at sparrow.katanaml.io.
Docs: Detailed setup guide and API documentation available.

Highlighted Details

Universal document processing for invoices, receipts, forms, bank statements, and tables.
Pluggable architecture supporting Sparrow Parse (Vision LLM), Instructor (Text LLM), and Agents (Workflows).
Multiple backends: MLX (Apple Silicon), Ollama, vLLM, PyTorch, Hugging Face Cloud GPU.
API-first design with RESTful APIs and interactive Swagger documentation.
Sparrow Agent for complex workflow orchestration with visual monitoring via Prefect.

Maintenance & Community

The project is led by Andrej Baranovskij and Katana ML. Community support is available via GitHub Issues. Commercial support and licensing are offered via abaranovskis@redsamuraiconsulting.com.

Licensing & Compatibility

Licensed under GPL 3.0, free for open-source projects and organizations under $5M revenue. Dual licensing is available for proprietary use, enterprise features, and dedicated support.

Limitations & Caveats

Performance on CPU-only configurations is significantly slower. While MLX is optimized for Apple Silicon, other backends may require specific GPU/CUDA setups. The README mentions "Enterprise Ready" features like rate limiting and usage analytics, but details on their implementation are not immediately apparent.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

28 stars in the last 30 days