sirchmunk  by modelscope

Indexless intelligence pipeline for dynamic data insights

Created 3 months ago
922 stars

Top 39.4% on SourcePulse

GitHubView on GitHub
Project Summary

Sirchmunk offers an "embedding-free," real-time intelligence pipeline that bypasses traditional RAG system costs and rigidity. It processes raw data directly, providing agile insights for AI agents and power users, featuring a self-evolving knowledge base that adapts to data changes.

How It Works

This system eschews vector embeddings for direct, instant, full-fidelity raw data search. Its core is "Monte Carlo Evidence Sampling," a token-efficient, three-phase strategy for intelligent document region sampling and LLM processing. Search outputs form "Self-Evolving Knowledge Clusters" that dynamically update with queries. This query-driven evolution enables semantic broadening and near-instant retrieval of cached information, eliminating re-indexing and LLM calls for repeated queries.

Quick Start & Requirements

Install via pip install sirchmunk or uv pip install sirchmunk. For the web UI, use pip install "sirchmunk[web]". Prerequisites: Python 3.10+, an OpenAI-compatible LLM API key, and optionally Node.js 18+ for UI builds. ripgrep-all and ripgrep are auto-installed or require manual setup. Documentation is at https://modelscope.github.io/sirchmunk-web/.

Highlighted Details

  • Embedding-Free Retrieval: Direct search on raw files, bypassing vectorization overhead for immediate, lossless data access.
  • Self-Evolving Knowledge Clusters: Dynamically updated knowledge units learn from queries, enabling query-driven embeddings and accelerating similar future searches.
  • Monte Carlo Evidence Sampling: Novel, token-efficient method for deep, context-aware evidence extraction.
  • MCP Integration: Exposes search as MCP tools for seamless integration with AI assistants (e.g., Claude Desktop, Cursor IDE).
  • Multi-Mode Search: Supports FAST (default, rapid), DEEP (comprehensive analysis), and FILENAME_ONLY (metadata) modes.

Maintenance & Community

Active development is evident from early 2026 releases. Hosted by modelscope on GitHub, community interaction primarily occurs via GitHub issues and discussions.

Licensing & Compatibility

Licensed under Apache License 2.0, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Current limitations include missing web search integration, multi-modal support, and distributed search, as per the roadmap. Core search functionality (except FILENAME_ONLY) requires an LLM API key, and the project appears to be in an early development stage.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
24
Issues (30d)
18
Star History
495 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), and
12 more.

mindsdb by mindsdb

0.2%
39k
AI query engine for federated data sources
Created 7 years ago
Updated 1 day ago
Feedback? Help us improve.