osgrep  by Ryandonofrio3

Semantic code search for local development and AI agents

Created 1 week ago

New!

710 stars

Top 48.2% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> osgrep provides semantic, natural-language search for codebases, functioning like grep but understanding concepts rather than just strings. It targets developers and power users seeking a fast, local, and private solution for code exploration, especially when integrating with AI coding agents. The primary benefit is enhanced code comprehension through intelligent, context-aware search capabilities.

How It Works

osgrep employs local transformer models via transformers.js to generate embeddings for code, enabling semantic search. It utilizes tree-sitter for smart chunking, splitting code by logical boundaries like functions and classes to capture complete concepts. A hybrid search approach combines vector search with keyword search using Reciprocal Rank Fusion (RRF) for improved accuracy. Adaptive throttling monitors system resources (CPU/RAM) to dynamically adjust indexing performance, ensuring it runs efficiently without overheating machines.

Quick Start & Requirements

  • Primary install / run command: npm install -g osgrep (or pnpm, bun).
  • Non-default prerequisites and dependencies: Requires a Node.js environment. Embedding models (~150MB) are downloaded upfront via osgrep setup or automatically on first use.
  • Estimated setup time or resource footprint: Initial setup involves model download. Adaptive throttling aims to minimize resource footprint during operation.
  • If they are present, include links to official quick-start, docs, demo, or other relevant pages: No external documentation links are provided in the README.

Highlighted Details

  • Semantic Search: Finds concepts (e.g., "authentication logic") rather than literal strings.
  • Local & Private: All embeddings and data are processed and stored entirely on the local machine.
  • Adaptive Throttling: Dynamically adjusts indexing concurrency based on system CPU and RAM usage to prevent performance issues.
  • Hybrid Search: Combines vector (semantic) and full-text (keyword) search using RRF for enhanced accuracy.
  • Auto-Repository Isolation: Automatically creates and manages separate indexes for each codebase, simplifying multi-repo workflows.
  • Agent Integration: Offers native integration with coding agents like Claude Code.

Maintenance & Community

The project acknowledges mgrep by MixedBread as a foundational influence, with significant rewrites for local-only operation. No specific contributors, sponsorships, or community channels (e.g., Discord, Slack) are detailed in the provided README.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool relies on the Node.js ecosystem. Initial embedding model downloads require approximately 150MB of storage. While adaptive throttling is implemented, performance may vary based on the complexity of the codebase and the user's hardware.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
20
Issues (30d)
30
Star History
729 stars in the last 10 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Meng Zhang Meng Zhang(Cofounder of TabbyML), and
16 more.

bloop by BloopAI

0.0%
9k
Code search engine with natural language interface
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.