CLI tool for semantic document search
Top 18.3% on sourcepulse
Semantra is a command-line tool for semantic document search, enabling users to query text and PDF files by meaning rather than exact keyword matching. It's designed for individuals like journalists, researchers, and students who need to efficiently find information within large document sets, offering a private, configurable, and user-friendly experience.
How It Works
Semantra analyzes documents by converting text into numerical embeddings using transformer models. These embeddings capture semantic meaning, allowing for searches based on conceptual similarity. The tool then launches a local web interface for interactive querying, where results are ranked by relevance and can be refined using positive or negative feedback on specific snippets. This approach prioritizes direct interaction with source material over generative AI summaries.
Quick Start & Requirements
python3 -m pipx install semantra
(requires Python >= 3.9 and pipx).semantra doc.pdf
or semantra file1.txt file2.pdf
.Highlighted Details
minilm
, mpnet
) or OpenAI's API.Maintenance & Community
Contributions are welcome; issues and feature requests can be submitted via GitHub.
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and integration with closed-source applications.
Limitations & Caveats
Semantra does not utilize generative AI models like ChatGPT, focusing solely on semantic search and presenting raw results. The initial document processing can be time-consuming.
11 months ago
1 day