instructor-js by 567-labs

Typescript tool for structured extraction from LLMs

Created 2 years ago

763 stars

Top 45.8% on SourcePulse

View on GitHub

4 Experts Love This Project

John Resig

Author of jQuery; Chief Software Architect at Khan Academy

Project Summary

This library provides structured data extraction from Large Language Models (LLMs) using TypeScript, OpenAI's function calling API, and Zod for schema validation. It's designed for developers needing to reliably parse LLM outputs into typed data structures, offering simplicity, transparency, and control over the extraction process.

How It Works

Instructor extends the OpenAI SDK client, enabling structured extraction by leveraging Zod schemas. It supports multiple modes (TOOLS, JSON, MD_JSON, JSON_SCHEMA) to guide LLM output formatting. The core mechanism involves passing a Zod schema to the response_model parameter in chat.completions.create, allowing the LLM to generate output that conforms to the defined structure, which is then validated and parsed by Zod.

Quick Start & Requirements

Install with bun add @instructor-ai/instructor zod openai, npm i @instructor-ai/instructor zod openai, or pnpm add @instructor-ai/instructor zod openai.
Requires Node.js environment, an OpenAI API key, and optionally other provider API keys.
Official documentation: https://github.com/567-labs/instructor-js

Highlighted Details

Supports streaming of partial extraction results.
Integrates with various LLM providers (Anyscale, Together, Anthropic, Azure, Cohere) via llm-polyglot.
Built on Island AI toolkit packages: zod-stream, schema-stream, llm-polyglot.
Leverages Zod for robust, customizable data validation.

Maintenance & Community

Developed by Dimitri Kennedy (creator of Island AI) and Jason Liu (author of original Python Instructor).
Community support and contributions are encouraged via GitHub issues.
Ports available for Python and Elixir.

Licensing & Compatibility

MIT License.
Compatible with commercial use and closed-source applications.

Limitations & Caveats

The library relies on LLM providers correctly implementing OpenAI's API specifications for seamless integration. Specific model capabilities and adherence to tool/function calling formats can influence extraction accuracy.

Health Check

Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days