ChatLongDoc  by webpilot-ai

Chat with any long document, overcoming LLM length limits

Created 2 years ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Addresses the challenge of OpenAI Chat-LLM context length limitations, enabling users to converse with and extract insights from any long document. It supports a wide array of file formats including PDF, DOC, DOCX, TXT, and web URLs, offering a more versatile alternative to solutions like ChatPDF. The project is designed for straightforward integration into other applications, benefiting researchers, analysts, and developers working with extensive textual data.

How It Works

This project circumvents LLM token limits by processing document content and storing "memorized information" (likely embeddings or indexed text) in a local cache (./memory). This allows for conversational interaction with documents of virtually any length. The system handles diverse file types and web URLs, abstracting the complexity of document parsing and LLM interaction into a cohesive conversational interface.

Quick Start & Requirements

  • Installation: Navigate to the project directory (cd ChatLongDoc) and install dependencies using pip install -r requirements.txt.
  • Prerequisites: Python version 3.8 or higher. An OpenAI API key is required, to be placed in ./openai_api_key.txt.
  • Usage: Interact via Python scripts (e.g., demo.ipynb) or shell commands (python chatLongDoc.py --text_path "your_text_path"). Supports local files (PDF, DOC, DOCX, TXT) and web URLs. Memory caching is automatic and can be explicitly managed via --memory_path.

Highlighted Details

  • Supports multiple document formats: PDF, DOC, DOCX, TXT, and web URLs.
  • Offers a WebApp, GPTs, ChatGPT Plugin, and Browser Extension for broader accessibility.
  • Designed for ease of expansion and integration into other applications.
  • Automatic caching of processed document content for faster subsequent access.

Maintenance & Community

The project indicates ongoing development with mentions of a latest WebApp release and available tools like GPTs, a ChatGPT Plugin, and a Browser Extension. A guide for deploying Chinese Chat-LLMs is also referenced.

Licensing & Compatibility

The provided README does not specify a license. Users should verify licensing terms before adoption, especially for commercial use or integration into proprietary systems.

Limitations & Caveats

A primary dependency is the requirement for an OpenAI API key, which incurs costs and relies on OpenAI's service availability. No specific limitations regarding document complexity, language support (beyond the mention of Chinese LLM guides), or platform compatibility are detailed in the README.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

companion-app by a16z-infra

0.1%
6k
AI companion stack for personalized chatbots
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.