ollama-ebook-summary by cognitivetech

CLI tool for LLM-based book summarization into bulleted notes

Created 2 years ago

604 stars

Top 54.2% on SourcePulse

Project Summary

This project provides a tool for generating comprehensive, bulleted note summaries of books and long texts, primarily targeting users who need to efficiently extract and organize information from digital documents like EPUBs and PDFs. It leverages local LLMs to process text, aiming to overcome the context window limitations of models by chunking documents into manageable segments.

How It Works

The core approach involves splitting long texts into approximately 2000-token chunks, a size identified as optimal for maintaining LLM reasoning performance. Unlike RAG systems that selectively retrieve chunks, this tool processes every chunk with the same prompts, ensuring comprehensive coverage. It prioritizes documents with Table of Contents (ToC) metadata for automated chapter extraction, with fallbacks for documents lacking this structure.

Quick Start & Requirements

Install: pip install -r requirements.txt
Prerequisites: Python 3.11.9, Ollama, and specific Ollama models (cognitivetech/obook_summary, cognitivetech/obook_title, gemma2).
Setup: Requires downloading models and configuring _config.yaml with model paths.
Usage: Convert EPUB/PDF to chunked CSV/TXT using book2text.py, then generate summaries with sum.py.
Docs: Ollama, HuggingFace

Highlighted Details

Supports EPUB and PDF formats, with EPUB preferred for ToC metadata.
Offers various pre-defined prompts for different summarization needs (e.g., bulleted notes, research arguments, markdown formatting).
Fine-tuned models (cognitivetech/obook_summary) are available on Ollama and HuggingFace.
Includes a prototype script (split_pdf.py) for more granular PDF splitting.

Maintenance & Community

The project is maintained by cognitivetech.
Resources section links to leaderboards for LLM evaluation (OpenAI, HuggingFace, Chatbot Arena).

Licensing & Compatibility

The README does not explicitly state a license.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project notes that summaries, especially those involving references, require manual verification due to potential inaccuracies or hallucinations from the LLM. The configuration file (_config.yaml) is noted as an area subject to change.

ollama-ebook-summary by cognitivetech

Explore Similar Projects

Text-Summarization-Repo by uoneway

docutranslate by xunbu

sum4all by fatwang2

rocketnotes by fynnfluegge

obsidian-ollama by hinterdupfinger

Just-Read by ZachSaucier

llmsherpa by nlmatics

onefilellm by jimmc414

Long-Novel-GPT by MaoXiaoYuZ

anx-reader by Anxcye

langextract by google

ChatPaper by kaixindelele