chunkhound  by chunkhound

Deep code and file research engine for AI assistants

Created 6 months ago
268 stars

Top 95.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project addresses the challenge of making codebases deeply searchable for AI assistants by transforming them into knowledge bases. It targets engineers, researchers, and power users who need to understand complex code relationships and discover features semantically. ChunkHound offers a local-first, privacy-preserving solution that enhances AI-assisted code research and development.

How It Works

ChunkHound leverages the research-backed cAST (Chunking via Abstract Syntax Trees) algorithm for semantic code chunking, preserving code meaning through structure-aware parsing. It employs Multi-Hop Semantic Search to uncover interconnected code relationships beyond simple keyword matches, enabling natural language queries like "find authentication code" to discover related components. The system operates on a local-first architecture, ensuring code privacy and enabling offline use with local models. It supports structured parsing for 29 languages via Tree-sitter and custom parsers, providing consistent semantic understanding across diverse codebases.

Quick Start & Requirements

  • Installation: Requires Python 3.10+ and the uv package manager. Install uv via curl -LsSf https://astral.sh/uv/install.sh | sh, then install ChunkHound with uv tool install chunkhound.
  • Prerequisites: An API key for semantic search is optional (e.g., OpenAI, VoyageAI, or local Ollama); regex search functions without keys.
  • Setup: Create a .chunkhound.json configuration file (e.g., specifying embedding provider and API key). Index your codebase using chunkhound index.
  • Documentation: Comprehensive guides are available at chunkhound.github.io.

Highlighted Details

  • Research Foundation: Built on the cAST algorithm, demonstrating significant gains in recall and code generation benchmarks (RepoEval, SWE-bench).
  • Local-First Architecture: Code remains on the user's machine, supporting offline use with Ollama and avoiding per-token costs for local models.
  • Universal Language Support: Structured parsing for 29 languages, including common programming languages and configuration files.
  • Intelligent Code Discovery: Multi-hop search, automatic feature pattern discovery (e.g., finding "authentication" yields related code), and convergence detection.
  • Real-Time Indexing: Features automatic file watching, efficient updates via smart content diffs, seamless Git branch switching, and live memory systems for documentation.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README excerpt.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license is permissive, generally allowing for commercial use and integration within closed-source projects.

Limitations & Caveats

Advanced semantic search capabilities require configuration with external API keys or a local Ollama setup. The README details complex exclusion and workspace overlay configurations that may require careful tuning. While benchmarks are cited, real-world performance may vary based on codebase size and complexity.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
27
Issues (30d)
9
Star History
96 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI), Will Brown Will Brown(Research Lead at Prime Intellect), and
7 more.

avante.nvim by yetone

0.3%
17k
Neovim plugin emulating Cursor AI IDE for AI-driven code assistance
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.