CLI tool for LLM-based book summarization into bulleted notes
Top 56.8% on sourcepulse
This project provides a tool for generating comprehensive, bulleted note summaries of books and long texts, primarily targeting users who need to efficiently extract and organize information from digital documents like EPUBs and PDFs. It leverages local LLMs to process text, aiming to overcome the context window limitations of models by chunking documents into manageable segments.
How It Works
The core approach involves splitting long texts into approximately 2000-token chunks, a size identified as optimal for maintaining LLM reasoning performance. Unlike RAG systems that selectively retrieve chunks, this tool processes every chunk with the same prompts, ensuring comprehensive coverage. It prioritizes documents with Table of Contents (ToC) metadata for automated chapter extraction, with fallbacks for documents lacking this structure.
Quick Start & Requirements
pip install -r requirements.txt
cognitivetech/obook_summary
, cognitivetech/obook_title
, gemma2
)._config.yaml
with model paths.book2text.py
, then generate summaries with sum.py
.Highlighted Details
cognitivetech/obook_summary
) are available on Ollama and HuggingFace.split_pdf.py
) for more granular PDF splitting.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project notes that summaries, especially those involving references, require manual verification due to potential inaccuracies or hallucinations from the LLM. The configuration file (_config.yaml
) is noted as an area subject to change.
4 weeks ago
1 week