docetl  by ucbepic

Agentic LLM-powered system for data processing and ETL

Created 1 year ago
3,029 stars

Top 15.7% on SourcePulse

GitHubView on GitHub
Project Summary

DocETL is a system for building and executing LLM-powered data processing pipelines, particularly for complex document tasks. It targets developers and researchers needing an interactive environment for prompt engineering and a production-ready Python package for pipeline execution, offering iterative development and automated data transformation.

How It Works

DocETL utilizes a pipeline-based architecture where each step is an operator that can be configured with LLM prompts. It supports chaining these operators to create complex workflows for tasks like data extraction, summarization, and transformation. The system emphasizes iterative development through an interactive UI (DocWrangler) that allows real-time prompt testing and pipeline visualization before exporting for production use.

Quick Start & Requirements

  • Interactive UI (DocWrangler):
    • Docker (recommended): make docker
    • Manual Setup: git clone, set .env and .env.local files, make install, make install-ui, make run-ui-dev. Access at http://localhost:3000/playground.
  • Python Package:
    • Install: pip install docetl
    • Prerequisites: Python 3.10+, OpenAI API key (or other LLM provider via liteLLM).
  • AWS Bedrock Support: Requires AWS credentials configured via aws configure or environment variables.
  • Resources: Local setup requires Docker or manual installation of Python dependencies. Running pipelines requires LLM API access, incurring costs.

Highlighted Details

  • Interactive UI (DocWrangler) for iterative prompt engineering and pipeline development.
  • Production-ready Python package for command-line or programmatic pipeline execution.
  • Supports integration with OpenAI and AWS Bedrock LLM providers.
  • Includes community projects and educational resources for learning and contribution.

Maintenance & Community

The project is hosted on GitHub with community contributions encouraged. Links to community discussions or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state the license. It is crucial to verify the license for commercial use or integration into closed-source projects.

Limitations & Caveats

The system relies heavily on LLM APIs, which can incur costs and introduce variability in output. Specific LLM provider configurations and model compatibility details are linked to liteLLM documentation. The project appears to be actively developed, and breaking changes may occur.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
0
Star History
82 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Addy Osmani Addy Osmani(Head of Chrome Developer Experience at Google), and
23 more.

goose by block

4.3%
22k
Open-source AI agent for automating complex engineering tasks
Created 1 year ago
Updated 9 hours ago
Feedback? Help us improve.