codeqai  by fynnfluegge

CLI tool for local semantic code search and chat

created 1 year ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a local-first solution for semantic code search and chat, enabling users to build custom copilots by fine-tuning models with code datasets. It targets developers seeking private, efficient code analysis and interaction tools.

How It Works

The system parses codebases using Tree-sitter for accurate syntax analysis, generating embeddings with Sentence-Transformers, Instructor-Embedding, or OpenAI models. These embeddings are stored in a FAISS vector database for fast semantic search. For chat functionality, it integrates with llama.cpp or Ollama for local LLM inference, or supports OpenAI/Azure OpenAI/Anthropic APIs. Synchronization with Git ensures the vector store remains up-to-date with code changes.

Quick Start & Requirements

  • Install via pipx install codeqai.
  • Requires Python >=3.9, <3.12.
  • faiss-cpu or faiss-gpu (recommended for CUDA 7.5+) must be installed.
  • Local LLM usage requires sentence-transformers, instructor, or llama.cpp.
  • Remote model usage requires API keys (OpenAI, Azure OpenAI, Anthropic).
  • Initial indexing may take time.
  • See Troubleshooting for installation issues.

Highlighted Details

  • Supports dataset generation for fine-tuning in Alpaca, conversational, instruction, or completion formats.
  • Integrates with Tree-sitter for parsing multiple languages including Python, TypeScript, JavaScript, Java, Rust, Kotlin, Go, C++, C, C#, and Ruby.
  • Offers 100% local processing for embeddings and LLMs, ensuring data privacy.
  • Provides a Streamlit UI for an interactive experience.

Maintenance & Community

The project is actively maintained with CI/CD pipelines for build and publish. Contributions are welcomed via issues or pull requests. Development can be managed with Conda or Poetry.

Licensing & Compatibility

Licensed under Apache 2.0, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

faiss-cpu wheels are not available for Python 3.12, requiring an earlier Python version for installation. Results may be improved with well-documented code. llama.cpp requires GGUF format models to be pre-downloaded.

Health Check
Last commit

5 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Ives van Hoorne Ives van Hoorne(Cofounder of CodeSandbox), and
4 more.

bloop by BloopAI

0.0%
9k
Code search engine with natural language interface
created 2 years ago
updated 8 months ago
Feedback? Help us improve.