osgrep by Ryandonofrio3

Semantic code search for local development and AI agents

Created 3 months ago

1,077 stars

Top 35.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Cofounder of Sourcegraph

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> osgrep provides semantic, natural-language search for codebases, functioning like grep but understanding concepts rather than just strings. It targets developers and power users seeking a fast, local, and private solution for code exploration, especially when integrating with AI coding agents. The primary benefit is enhanced code comprehension through intelligent, context-aware search capabilities.

How It Works

osgrep employs local transformer models via transformers.js to generate embeddings for code, enabling semantic search. It utilizes tree-sitter for smart chunking, splitting code by logical boundaries like functions and classes to capture complete concepts. A hybrid search approach combines vector search with keyword search using Reciprocal Rank Fusion (RRF) for improved accuracy. Adaptive throttling monitors system resources (CPU/RAM) to dynamically adjust indexing performance, ensuring it runs efficiently without overheating machines.

Quick Start & Requirements

Primary install / run command: npm install -g osgrep (or pnpm, bun).
Non-default prerequisites and dependencies: Requires a Node.js environment. Embedding models (~150MB) are downloaded upfront via osgrep setup or automatically on first use.
Estimated setup time or resource footprint: Initial setup involves model download. Adaptive throttling aims to minimize resource footprint during operation.
If they are present, include links to official quick-start, docs, demo, or other relevant pages: No external documentation links are provided in the README.

Highlighted Details

Semantic Search: Finds concepts (e.g., "authentication logic") rather than literal strings.
Local & Private: All embeddings and data are processed and stored entirely on the local machine.
Adaptive Throttling: Dynamically adjusts indexing concurrency based on system CPU and RAM usage to prevent performance issues.
Hybrid Search: Combines vector (semantic) and full-text (keyword) search using RRF for enhanced accuracy.
Auto-Repository Isolation: Automatically creates and manages separate indexes for each codebase, simplifying multi-repo workflows.
Agent Integration: Offers native integration with coding agents like Claude Code.

Maintenance & Community

The project acknowledges mgrep by MixedBread as a foundational influence, with significant rewrites for local-only operation. No specific contributors, sponsorships, or community channels (e.g., Discord, Slack) are detailed in the provided README.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool relies on the Node.js ecosystem. Initial embedding model downloads require approximately 150MB of storage. While adaptive throttling is implemented, performance may vary based on the complexity of the codebase and the user's hardware.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

56 stars in the last 30 days