kor by eyurtsev

LLM wrapper for structured data extraction

Created 2 years ago

1,699 stars

Top 24.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Travis Fischer

Founder of Agentic

Jeff Hammerbacher

Cofounder of Cloudera

Tomas Valenta

Cofounder of E2B

Project Summary

Kor is a Python library designed for extracting structured data from text using Large Language Models (LLMs), particularly those without native tool-calling capabilities. It targets developers needing to parse unstructured text into predefined schemas, offering a flexible alternative to newer chat model APIs.

How It Works

Kor operates by generating a prompt that includes a user-defined schema and examples, sending it to a specified LLM, and then parsing the LLM's output. It supports two schema definition styles: Kor's own Object and Text definitions, and Pydantic models. This approach is advantageous as it works with any LLM capable of understanding prompts and generating text, regardless of whether they support advanced features like JSON mode or function calling.

Quick Start & Requirements

Install via pip: pip install kor
Tested with Python 3.8-3.11.
Requires an LLM API key (e.g., OpenAI).
Documentation: https://kor.readthedocs.io/en/latest/

Highlighted Details

Supports both Kor's custom schema and Pydantic v1/v2 models for defining extraction targets.
Offers flexibility by working with LLMs that lack native tool-calling or JSON modes.
Can be integrated with the LangChain framework.
Performance is dependent on LLM choice; larger, slower models are recommended for better quality.

Maintenance & Community

The project is marked as a "half-baked prototype" with an unstable API.
Open issues are encouraged for discussion and feature requests.
Alternatives like Promptify and MiniChain are suggested.

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility with commercial use or closed-source linking is not specified.

Limitations & Caveats

Kor is a prototype with an unstable API and is known for being slow and potentially crashing on long text inputs due to context window limitations. Its extraction quality heavily relies on the quality of provided examples and schema documentation.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days