langchain-extract  by langchain-ai

FastAPI web server for LLM-powered data extraction

Created 1 year ago
1,163 stars

Top 33.2% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a FastAPI web server for extracting structured information from text and files using Large Language Models (LLMs). It's designed as a reference implementation and starting point for developers building custom data extraction applications, offering a REST API, JSON schema definition for extraction targets, and support for few-shot examples to improve accuracy.

How It Works

The server leverages LangChain for LLM orchestration and FastAPI for its web framework. Extraction logic is defined via JSON schemas, allowing users to specify the desired output structure. The system supports incorporating few-shot examples, provided via a separate API endpoint, to guide the LLM and enhance the quality of extracted results. It stores extractors and examples in a PostgreSQL database.

Quick Start & Requirements

  • Install/Run: Use docker compose build and docker compose up.
  • Prerequisites: OpenAI API key (required), Fireworks or Together API keys (optional for additional models).
  • Setup: Requires Docker. API key configuration via .local.env.
  • Docs: extract.langchain.com

Highlighted Details

  • REST API with OpenAPI documentation.
  • Supports extraction from text and binary files (e.g., HTML, PDF).
  • LangServe endpoint for integration with LangChain RemoteRunnable.
  • Ability to create, save, and manage extractors and examples in a database.

Maintenance & Community

This project is under active development by LangChain AI. While pull requests are not currently accepted, feedback via issues and discussions is encouraged.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The project is under active development, with breaking changes expected between releases. The main branch should not be used directly; checkout releases instead. User authentication is not implemented, with access controlled by a user ID generated via uuidgen.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Jason Huggins Jason Huggins(Creator of Selenium), and
2 more.

instructor-js by 567-labs

0.3%
753
Typescript tool for structured extraction from LLMs
Created 1 year ago
Updated 7 months ago
Feedback? Help us improve.