DeepGit  by zamalali

Research agent for discovering relevant GitHub repositories

created 7 months ago
778 stars

Top 45.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DeepGit is an advanced agentic workflow for deep GitHub repository research, targeting developers and researchers seeking to discover relevant, under-the-radar open-source tools. It leverages a hybrid retrieval system with ColBERT v2 embeddings and cross-encoder re-ranking, enhanced by hardware-awareness and activity analysis, to provide nuanced and contextually relevant recommendations.

How It Works

DeepGit employs a Langgraph-based agentic workflow. It begins with query expansion and hardware specification detection. A ColBERT-v2 retriever performs token-level semantic search, followed by a MiniLM-L6-v2 cross-encoder for passage-level re-ranking. A hardware-aware filter then discards incompatible repositories based on detected hardware constraints. Finally, community and code activity metrics are analyzed, and a multi-factor ranking is presented to the user. This approach enables fine-grained similarity matching and ensures discovered tools are practically usable on the user's hardware.

Quick Start & Requirements

  • Install via pip install -r requirements.txt after cloning the repository.
  • Requires Python 3.11+ and pip 24.0+.
  • Running the application locally is done via python app.py.
  • The Langsmith dashboard can be launched with langgraph dev.
  • Docker support is available.

Highlighted Details

  • Utilizes multi-dimensional ColBERT v2 embeddings for fine-grained, token-level similarity.
  • Features a smart hardware filter to exclude repositories incompatible with user's device specs (CPU-only, low RAM, mobile).
  • Integrates hybrid dense retrieval, cross-encoder re-ranking, and activity analysis.
  • Provides an intuitive UI for query input and tabulated results with similarity scores and hardware compatibility badges.

Maintenance & Community

The project is open-source and actively developed. Links to community resources or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The "Lite" version running on Hugging Face Spaces may not perform as well as the full version. The README does not specify the license, which could impact commercial adoption.

Health Check
Last commit

4 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
110 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.