Locally hosted tool connecting documents to LLMs for summarization and querying
Top 45.1% on sourcepulse
BriefGPT is a locally-hosted tool designed for document summarization and querying using Large Language Models (LLMs). It targets users who prioritize data privacy and control, offering a secure way to interact with documents via a simple GUI.
How It Works
The tool processes documents by chunking them and creating FAISS indexes for efficient similarity search. A novel re-ranking function refines retrieved results by stripping stopwords and using fuzzy matching to improve relevance over pure similarity. For summarization, documents (or YouTube transcripts) are embedded, clustered using K-means, and then summarized in a two-step process: individual chunk summarization followed by a final aggregation.
Quick Start & Requirements
pip install -r requirements.txt
streamlit run main.py
documents
folder.pandoc
is installed and in your PATH.test.env
.models
folder and environment variable configuration.Highlighted Details
Maintenance & Community
This project was made for fun and is open to contributions and bug reports.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is experimental, potentially buggy, and not fully optimized, especially the local LLM functionality which may be significantly slower with variable quality. Summary state persistence is a noted TODO item.
2 years ago
1 day