Discover and explore top open-source AI tools and projects—updated daily.
finic-aiLLM tool for document transformation using natural language instructions
Top 61.5% on SourcePulse
Doctran is a Python framework for transforming unstructured text into structured data using Large Language Models (LLMs). It targets developers and researchers needing to process complex text for tasks like data labeling or semantic information extraction, offering a modular, declarative wrapper around OpenAI's function calling feature to simplify LLM interactions.
How It Works
Doctran acts as an LLM-powered processing pipeline, taking messy strings as input and producing clean, structured output. It leverages OpenAI's function calling capabilities to extract data based on provided JSON schemas and offers built-in transformers for common tasks like redaction (using spaCy locally), summarization, refinement, translation, and interrogation (converting text to Q&A pairs). The framework supports chaining these transformations in a specified order, allowing for complex multi-step processing workflows.
Quick Start & Requirements
pip install doctranfrom doctran import Doctran
doctran = Doctran(openai_api_key=OPENAI_API_KEY)
document = doctran.parse(content="your_content_as_string")
examples.ipynb.Highlighted Details
redact, extract, summarize, refine, translate, and interrogate.DocumentTransformer or OpenAIDocumentTransformer.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
summarize transformer notes that token_limit may not be strictly respected by OpenAI.1 year ago
Inactive
kyang6
nlmatics
fighting41love