BriefGPT  by e-johnstonn

Locally hosted tool connecting documents to LLMs for summarization and querying

created 2 years ago
794 stars

Top 45.1% on sourcepulse

GitHubView on GitHub
Project Summary

BriefGPT is a locally-hosted tool designed for document summarization and querying using Large Language Models (LLMs). It targets users who prioritize data privacy and control, offering a secure way to interact with documents via a simple GUI.

How It Works

The tool processes documents by chunking them and creating FAISS indexes for efficient similarity search. A novel re-ranking function refines retrieved results by stripping stopwords and using fuzzy matching to improve relevance over pure similarity. For summarization, documents (or YouTube transcripts) are embedded, clustered using K-means, and then summarized in a two-step process: individual chunk summarization followed by a final aggregation.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run the application: streamlit run main.py
  • Place documents in the documents folder.
  • For EPUB support, ensure pandoc is installed and in your PATH.
  • API key setup via test.env.
  • Local LLM support requires models in the models folder and environment variable configuration.

Highlighted Details

  • Supports both OpenAI API and fully local LLM execution (LlamaCpp, GPT4ALL).
  • Handles PDF and TXT documents, with EPUB support via pandoc.
  • YouTube transcript summarization is also supported.
  • Utilizes K-means clustering for thematic document grouping.

Maintenance & Community

This project was made for fun and is open to contributions and bug reports.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is experimental, potentially buggy, and not fully optimized, especially the local LLM functionality which may be significantly slower with variable quality. Summary state persistence is a noted TODO item.

Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.