semtools  by run-llama

CLI tools for semantic search and document parsing

Created 3 weeks ago

New!

951 stars

Top 38.6% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides high-performance command-line tools for document processing and semantic search, built with Rust. It's designed for developers and power users who need efficient, local, or cloud-assisted text analysis and retrieval capabilities, offering a Unix-friendly interface for seamless integration into existing workflows.

How It Works

The parse tool leverages the LlamaParse API (or other backends) to convert various document formats (PDF, DOCX, etc.) into markdown, with features like caching and concurrent processing for speed. The search tool performs local, fast semantic keyword searches using model2vec embeddings and cosine similarity, offering per-line context matching and configurable distance thresholds without requiring a separate vector database.

Quick Start & Requirements

  • Install: cargo install semtools (or --features=parse or --features=search for specific tools).
  • Prerequisites: Rust and Cargo.
  • parse tool: Requires a LlamaIndex Cloud API key, configurable via ~/.parse_config.json or the LLAMA_CLOUD_API_KEY environment variable.
  • Docs: LlamaIndex Cloud API

Highlighted Details

  • Fast semantic search using model2vec embeddings.
  • Reliable document parsing with caching and error handling.
  • Unix-friendly design with stdin/stdout support.
  • Multi-format parsing (PDF, DOCX, PPTX, etc.).
  • Concurrent processing for parsing.

Maintenance & Community

  • Built with Rust, leveraging model2vec-rs and simsimd.
  • Contributions are welcome.
  • Licensed under the MIT License.

Licensing & Compatibility

  • MIT License.
  • Compatible with commercial use and closed-source linking.

Limitations & Caveats

The parse tool defaults to the LlamaParse API, which requires an API key and internet connectivity. Future work includes adding local-only parsing backends.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
21
Issues (30d)
5
Star History
956 stars in the last 26 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Simon Willison Simon Willison(Coauthor of Django).

semantra by freedmand

0.1%
3k
CLI tool for semantic document search
Created 2 years ago
Updated 1 year ago
Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Simon Horup Eskildsen Simon Horup Eskildsen(Cofounder of Turbopuffer), and
21 more.

meilisearch by meilisearch

0.2%
53k
Search engine API for integrating AI-powered hybrid search
Created 7 years ago
Updated 1 day ago
Feedback? Help us improve.