ollama-ebook-summary  by cognitivetech

CLI tool for LLM-based book summarization into bulleted notes

created 1 year ago
578 stars

Top 56.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a tool for generating comprehensive, bulleted note summaries of books and long texts, primarily targeting users who need to efficiently extract and organize information from digital documents like EPUBs and PDFs. It leverages local LLMs to process text, aiming to overcome the context window limitations of models by chunking documents into manageable segments.

How It Works

The core approach involves splitting long texts into approximately 2000-token chunks, a size identified as optimal for maintaining LLM reasoning performance. Unlike RAG systems that selectively retrieve chunks, this tool processes every chunk with the same prompts, ensuring comprehensive coverage. It prioritizes documents with Table of Contents (ToC) metadata for automated chapter extraction, with fallbacks for documents lacking this structure.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python 3.11.9, Ollama, and specific Ollama models (cognitivetech/obook_summary, cognitivetech/obook_title, gemma2).
  • Setup: Requires downloading models and configuring _config.yaml with model paths.
  • Usage: Convert EPUB/PDF to chunked CSV/TXT using book2text.py, then generate summaries with sum.py.
  • Docs: Ollama, HuggingFace

Highlighted Details

  • Supports EPUB and PDF formats, with EPUB preferred for ToC metadata.
  • Offers various pre-defined prompts for different summarization needs (e.g., bulleted notes, research arguments, markdown formatting).
  • Fine-tuned models (cognitivetech/obook_summary) are available on Ollama and HuggingFace.
  • Includes a prototype script (split_pdf.py) for more granular PDF splitting.

Maintenance & Community

  • The project is maintained by cognitivetech.
  • Resources section links to leaderboards for LLM evaluation (OpenAI, HuggingFace, Chatbot Arena).

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes that summaries, especially those involving references, require manual verification due to potential inaccuracies or hallucinations from the LLM. The configuration file (_config.yaml) is noted as an area subject to change.

Health Check
Last commit

4 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
39 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.