deepdoc  by Oqura-ai

AI-powered local document research and reporting

Created 10 months ago
253 stars

Top 99.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Deepdoc is a research tool designed for local knowledge bases, enabling users to conduct in-depth analysis of their own documents (PDF, DOCX, JPG, TXT, etc.) instead of relying on internet searches. It automates the process of extracting insights, organizing findings, and generating structured markdown reports, making it valuable for researchers and power users who need to quickly synthesize information from large local datasets.

How It Works

The system ingests local documents, extracts text, and segments it into page-wise chunks stored in a vector database for semantic search. Users provide an instruction query, guiding the generation of a content structure. Research agents then iteratively generate knowledge for report sections by creating research queries, searching the local data, and refining results through reflection agents. Finally, a report writer compiles section content into a comprehensive markdown report. This approach allows for a systematic, agent-driven exploration of local data.

Quick Start & Requirements

  • Installation: Clone the repository, create and activate a virtual environment using uv, install dependencies with uv pip install -r requirements.txt.
  • Prerequisites: uv (for environment/dependency management), Docker and Docker Compose (for Qdrant vector database), API keys for Mistral, Tavily, and OpenAI.
  • Configuration: Set API keys and other parameters (e.g., EMBEDDING_MODEL, QDRANT_URL) in a .env file. Customize LLM and thread configurations in configuration.py.
  • Running: Start Qdrant via docker-compose up --build, then run the application with python main.py.
  • Links: uv installation: official uv GitHub repository.

Highlighted Details

  • Employs a multi-agent research workflow with "reflection agents" for iterative refinement of findings.
  • Supports a variety of local document types including PDF, DOCX, JPG, and TXT.
  • Generates structured markdown reports based on user-defined instructions and local data analysis.
  • Configurable LLM and research parameters allow tuning of the agentic workflow.

Maintenance & Community

The project is authored by Swaraj Biswal and Swadhin Biswal. Contributions are welcomed via issues or pull requests. No specific community channels (e.g., Discord, Slack) or sponsorship details are provided in the README.

Licensing & Compatibility

Licensed under the MIT License. This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The tool's functionality is dependent on the user providing valid API keys for external LLM and search services (Mistral, Tavily, OpenAI). Setup requires familiarity with uv, Docker, and environment variable management. The effectiveness of the research output is tied to the quality of the input documents and the configuration of the research agents.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.