doctor  by sisig-ai

LLM agent tool for web crawling, indexing, and reasoning

created 3 months ago
453 stars

Top 67.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Doctor is a system designed to equip LLM agents with the ability to discover, crawl, and index web content, enabling more up-to-date reasoning and code generation. It targets developers and researchers building AI agents that require access to current information from the web.

How It Works

Doctor orchestrates a pipeline involving web crawling (crawl4ai), text chunking (LangChain), embedding generation (OpenAI via litellm), and data storage with vector search (DuckDB). These components are managed via a unified database class and asynchronous task processing using Redis. The indexed data and search capabilities are exposed through a FastAPI web service, which also serves as an MCP server for seamless integration with LLM agents.

Quick Start & Requirements

  • Install/Run: Clone the repository, set OPENAI_API_KEY environment variable, and run docker compose up.
  • Prerequisites: Docker, Docker Compose, Python 3.10+, uv, OpenAI API key.
  • Resources: Requires Docker and an OpenAI API key. Setup time is minimal once prerequisites are met.
  • Docs: OpenAPI Docs, MCP Server Configuration.

Highlighted Details

  • Provides a full stack for web content indexing and LLM agent integration.
  • Exposes functionality via a FastAPI web service and an MCP server.
  • Utilizes DuckDB for efficient vector search and Redis for asynchronous task management.
  • Includes comprehensive testing infrastructure and pre-commit hooks for code quality.

Maintenance & Community

The project is actively maintained with Python tests and code coverage reports. Links to community channels or roadmaps are not explicitly provided in the README.

Licensing & Compatibility

Licensed under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Requires an OpenAI API key for embedding generation, which may incur costs. The system relies on Docker Compose for deployment, and specific version requirements for Python (3.10+) and Docker are noted.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
455 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Alex Cheema Alex Cheema(Cofounder of EXO Labs), and
3 more.

Perplexica by ItzCrazyKns

0.3%
23k
AI-powered search engine alternative
created 1 year ago
updated 1 day ago
Feedback? Help us improve.