BriefGPT  by e-johnstonn

Locally hosted tool connecting documents to LLMs for summarization and querying

Created 2 years ago
796 stars

Top 44.2% on SourcePulse

GitHubView on GitHub
Project Summary

BriefGPT is a locally-hosted tool designed for document summarization and querying using Large Language Models (LLMs). It targets users who prioritize data privacy and control, offering a secure way to interact with documents via a simple GUI.

How It Works

The tool processes documents by chunking them and creating FAISS indexes for efficient similarity search. A novel re-ranking function refines retrieved results by stripping stopwords and using fuzzy matching to improve relevance over pure similarity. For summarization, documents (or YouTube transcripts) are embedded, clustered using K-means, and then summarized in a two-step process: individual chunk summarization followed by a final aggregation.

Quick Start & Requirements

  • Install dependencies: pip install -r requirements.txt
  • Run the application: streamlit run main.py
  • Place documents in the documents folder.
  • For EPUB support, ensure pandoc is installed and in your PATH.
  • API key setup via test.env.
  • Local LLM support requires models in the models folder and environment variable configuration.

Highlighted Details

  • Supports both OpenAI API and fully local LLM execution (LlamaCpp, GPT4ALL).
  • Handles PDF and TXT documents, with EPUB support via pandoc.
  • YouTube transcript summarization is also supported.
  • Utilizes K-means clustering for thematic document grouping.

Maintenance & Community

This project was made for fun and is open to contributions and bug reports.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is experimental, potentially buggy, and not fully optimized, especially the local LLM functionality which may be significantly slower with variable quality. Summary state persistence is a noted TODO item.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Simon Willison Simon Willison(Coauthor of Django).

semantra by freedmand

0.1%
3k
CLI tool for semantic document search
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.