pageindex-mcp  by VectifyAI

Vectorless RAG system for structured document reasoning

Created 6 months ago
261 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
Project Summary

PageIndex MCP provides a vectorless, reasoning-based Retrieval Augmented Generation (RAG) system that exposes an LLM-native, in-context tree index via MCP. It enables platforms like Claude and Cursor to retrieve information from documents using structured reasoning, mimicking human expert navigation, without relying on vector databases or chunking. This offers a novel approach for interacting with long documents, such as PDFs, by preserving full context and enhancing retrieval accuracy and transparency.

How It Works

The system represents documents as hierarchical tree structures. LLMs navigate and retrieve information through multi-step reasoning and tree search, directly querying the index via MCP. This vectorless approach bypasses the need for vector databases and document chunking, preserving original document context and structure. The core advantage lies in achieving higher relevance through explicit reasoning paths rather than similarity matching, offering improved transparency and a more human-like information retrieval experience.

Quick Start & Requirements

  • Installation: Developers integrate via MCP by adding pageindex server configuration to their MCP setup, using an API key obtained from the PageIndex Dashboard. For Claude Desktop, a one-click install .mcpb file is available from Releases. A local MCP server for local PDF uploads requires Node.js ≥18.0.0 and can be run via npx -y @pageindex/mcp.
  • Prerequisites: Node.js ≥18.0.0 for local server. API Key for cloud service.
  • Links: PageIndex Dashboard, PageIndex Chat, Releases.

Highlighted Details

  • Vectorless, reasoning-based RAG system.
  • Hierarchical document tree indexing and tree search retrieval.
  • Eliminates vector databases and document chunking.
  • Preserves full document context and structure.
  • Enables human-like, transparent information retrieval.
  • Integrates with Claude, Cursor, Vercel AI SDK, OpenAI Agents SDK, LangChain.
  • Supports local and online PDFs; offers a free tier (1000 pages).

Maintenance & Community

No specific details on contributors, sponsorships, or community channels were found in the provided README.

Licensing & Compatibility

Licensed under the MIT open-source license, which is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The README does not detail specific limitations, alpha status, or known bugs. The local server setup requires Node.js, which may be a dependency barrier for some environments. The novel vectorless approach's performance characteristics and scalability compared to established vector-based RAG systems are not elaborated upon.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
55 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy) and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

PageIndex by VectifyAI

3.3%
21k
Document index system for reasoning-based RAG
Created 11 months ago
Updated 1 week ago
Feedback? Help us improve.