Discover and explore top open-source AI tools and projects—updated daily.
MinishLabFast, accurate code search for AI agents
Top 14.9% on SourcePulse
Semble: Fast and Accurate Code Search for Agents
Semble is a specialized code search library engineered for AI agents, designed to drastically reduce token consumption and latency compared to traditional tools like grep+read. It provides agents with instant access to precise code snippets, enabling faster and more efficient code understanding and generation workflows. By indexing and searching entire codebases in under a second, Semble aims to be a foundational tool for agent-based software development.
How It Works
Semble employs a hybrid retrieval strategy combining semantic and lexical search. It segments code files into manageable chunks using the "Chonkie" library. Queries are then scored against these chunks using two complementary methods: static Model2Vec embeddings derived from a code-specialized model for semantic similarity, and BM25 for lexical matching of identifiers and API names. These scores are fused using Reciprocal Rank Fusion (RRF). The results are further refined by a sophisticated set of code-aware ranking signals, including adaptive weighting based on query type (natural language vs. symbol-like), boosting definitions of queried symbols, matching identifier stems, promoting file coherence, and penalizing noise from test or example files. This multi-stage approach allows for high accuracy and speed, running entirely on CPU without requiring computationally expensive transformer forward passes at query time.
Quick Start & Requirements
pip install semble or uv add semble.uv is recommended for MCP server setup.SembleIndex.from_path, SembleIndex.from_git, index.search, index.find_related) and a standalone CLI (semble search, semble find-related).Highlighted Details
grep+read by returning only relevant code chunks.Maintenance & Community
The provided README does not detail specific maintenance schedules, notable contributors, sponsorships, or community channels (e.g., Discord, Slack). The project is authored by Thomas van Dongen and Stephan Tulkens.
Licensing & Compatibility
Semble is released under the MIT license. This permissive license allows for broad compatibility with commercial use and integration into closed-source projects.
Limitations & Caveats
While Semble offers superior speed and token efficiency for agent-based code search, the README suggests that traditional grep remains preferable for exhaustive literal string matching or quick confirmation of exact text. Additionally, Semble actively down-ranks code found in test files, compatibility shims, example directories, and declaration stubs, which may be a limitation if these specific code types are the primary search target.
15 hours ago
Inactive
mixedbread-ai