codemogger by glommer

Local code indexing and search for AI coding agents

Created 3 months ago

318 stars

Top 85.0% on SourcePulse

Project Summary

A local, self-contained code indexing library and MCP server for AI coding agents, Codemogger addresses the need for efficient code understanding. It parses source files using tree-sitter, semantically chunks them into logical units, and stores these along with local embeddings in a single SQLite database. This enables AI agents to perform fast, precise keyword searches and nuanced semantic queries without relying on external servers or API keys, streamlining codebase navigation and comprehension.

How It Works

The system scans codebases, respecting .gitignore, and leverages tree-sitter (WASM) to generate Abstract Syntax Trees (ASTs) for semantic chunking of definitions like functions, structs, and classes. These chunks are then encoded using a local embedding model (defaulting to all-MiniLM-L6-v2) and stored in an embedded SQLite database. This database integrates Full-Text Search (FTS) for keyword matching and vector search for semantic similarity. Incremental indexing efficiently updates the database by re-processing only modified files based on SHA-256 hashes.

Quick Start & Requirements

Installation: Global npm install (npm install -g codemogger) or via npx.
Prerequisites: Node.js/npm. Supports 13 languages including Rust, C/C++, Go, Python, Zig, Java, Scala, JavaScript/TypeScript, PHP, and Ruby.
Usage: Index a project with codemogger index ./my-project and search using codemogger search "query". It can be integrated as an MCP server via a JSON configuration.
SDK: A TypeScript SDK is available, allowing users to provide their own embedding functions.
Links: No external documentation or demo links are provided in the README.

Highlighted Details

Performance: Keyword search is significantly faster (25x-370x) than ripgrep and yields precise definitions. Semantic search excels at finding relevant code via natural language queries, outperforming keyword-based tools when exact terms are unknown.
Efficiency: Utilizes int8 quantized embeddings (395 bytes/chunk) and a quantized embedding model for reduced storage and faster local CPU processing.
Single DB: Consolidates multiple codebases into a single SQLite file, simplifying management and deployment.

Maintenance & Community

The provided README does not contain specific details regarding maintainers, community channels (e.g., Discord, Slack), sponsorships, or a public roadmap.

Licensing & Compatibility

License: MIT.
Compatibility: The permissive MIT license supports commercial use and integration into closed-source applications.

Limitations & Caveats

The README does not detail specific limitations, alpha status, or known bugs. Performance benchmarks are based on an Apple M2 (8GB) and may vary across different hardware configurations. The tool's effectiveness is dependent on the quality of tree-sitter grammars for supported languages.

codemogger by glommer

Explore Similar Projects

sdl-mcp by GlitterKill

Mantic.sh by marcoaapfortes

agent-skill by ast-grep

Claude-ast-index-search by defendend

next-plaid by lightonai

probe by probelabs

osgrep by Ryandonofrio3

chunkhound by chunkhound

grepai by yoanbernabeu

cocoindex-code by cocoindex-io

jcodemunch-mcp by jgravelle

codegraph by colbymchenry