chatWeb  by SkywalkerDarren

CLI tool for question answering and summarization over web pages and documents

created 2 years ago
907 stars

Top 40.9% on sourcepulse

GitHubView on GitHub
Project Summary

ChatWeb is a tool for extracting and summarizing content from web pages, PDFs, DOCX, and TXT files, enabling users to ask questions based on the provided text. It targets users who need to process and query large documents or web content, offering a way to overcome token limits by leveraging embeddings and a vector database.

How It Works

The system crawls web pages or extracts text from documents, then uses GPT-3.5's embedding API to create vector representations for each text segment. A key innovation is generating vectors from keywords derived from user input, rather than the entire query, to improve search accuracy. These vectors are stored in a vector database, allowing for nearest neighbor searches to retrieve relevant text segments. GPT-3.5's chat API is then used to formulate answers based on these retrieved segments.

Quick Start & Requirements

  • Installation: Clone the repository, install dependencies (pip3 install -r requirements.txt), and run python3 main.py. Docker is also supported via docker-compose up.
  • Prerequisites: Python 3, OpenAI API key.
  • Optional: PostgreSQL with the pgvector extension for persistent storage.
  • Configuration: Edit config.json to set API keys, language, mode (console, api, webui), streaming, temperature, and proxy settings.
  • Demo: http://localhost:7860 (default web UI port).

Highlighted Details

  • Supports web page crawling and extraction from PDF, DOCX, TXT files.
  • Offers multiple operational modes: console, API, and web UI.
  • Includes optional PostgreSQL integration with pgvector for enhanced data management.
  • Features configurable OpenAI API proxy settings and response temperature.

Maintenance & Community

The project is actively maintained by SkywalkerDarren. Further community or roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The project is primarily reliant on OpenAI's GPT-3.5 API, incurring associated costs. While it lists many features as implemented, some items like "Other features that have not been thought of yet" remain open.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Jared Palmer Jared Palmer(Ex-VP of AI at Vercel; Founder of Turborepo; Author of Formik, TSDX).

chatgpt-pgvector by gannonh

0%
938
Domain-specific chat completions app
created 2 years ago
updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind).

LightRAG by HKUDS

1.0%
19k
RAG framework for fast, simple retrieval-augmented generation
created 10 months ago
updated 20 hours ago
Feedback? Help us improve.