deep-searcher  by zilliztech

Deep search alternative for private data, using LLMs and vector DBs

Created 7 months ago
6,940 stars

Top 7.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DeepSearcher is an open-source Python framework for building private data search and reasoning systems. It integrates Large Language Models (LLMs) with vector databases to provide accurate answers and comprehensive reports from enterprise knowledge bases, targeting enterprise knowledge management and intelligent Q&A.

How It Works

DeepSearcher orchestrates interactions between various LLMs and embedding models, leveraging vector databases like Milvus for efficient data retrieval. Users can load local files or crawl websites, embed the content, store it in a vector database, and then query it using LLMs. This modular approach allows flexibility in choosing components for optimal performance and cost.

Quick Start & Requirements

  • Installation: pip install deepsearcher or pip install "deepsearcher[ollama]" for optional dependencies. Development installation via uv sync is also supported.
  • Prerequisites: Python 3.10+ recommended. API keys for chosen LLMs and embedding models (e.g., OpenAI, DeepSeek, Anthropic, Google Gemini) are required. For local vector storage, Milvus Lite is used; for larger deployments, a Milvus server or Zilliz Cloud is recommended. Web crawling requires a FIRECRAWL_API_KEY.
  • Demo: A Python quick-start example is provided, requiring OPENAI_API_KEY for basic functionality.
  • Docs: Configuration Details, Quick Start Demo

Highlighted Details

  • Supports a wide array of LLMs including OpenAI, Qwen, DeepSeek, Grok, Claude, and Llama.
  • Integrates with multiple embedding models and vector databases (Milvus, Zilliz Cloud, Qdrant).
  • Offers flexible data loading from local files and web crawling capabilities.
  • Provides a Python CLI for loading and querying, and a FastAPI service for API access.

Maintenance & Community

The project is maintained by Zilliz. Community engagement is encouraged via GitHub stars and forks.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

Some features like web crawling and certain document loaders are noted as "under development." Offline mode for Hugging Face model downloads may require network proxy or token configuration. Jupyter notebook usage may require nest_asyncio.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
140 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.