cyber-doctor  by Warma10032

Multi-modal AI agent for personalized health assistance

Created 1 year ago
267 stars

Top 95.9% on SourcePulse

GitHubView on GitHub
Project Summary

"Cyber Huatuo" (Warma10032/cyber-doctor) is a multi-modal intelligent agent designed to democratize healthcare access by providing a personal doctor assistant. It leverages large language models (LLMs) and knowledge graphs to offer functionalities like preliminary disease diagnosis, medical record analysis, and professional health Q&A, aiming to bridge geographical disparities in medical resources. The project targets individuals concerned about their health and can be adapted for domain-specific expertise beyond healthcare.

How It Works

The project integrates multiple AI models, orchestrated by an AI agent, to handle complex tasks. It features a core LLM backbone enhanced with Retrieval Augmented Generation (RAG) for knowledge base and internet retrieval, and a Neo4j knowledge graph for structured domain knowledge. A dedicated voice module supports speech-to-text (STT) and text-to-speech (TTS) for an accessible conversational interface. Multi-modal capabilities include image recognition for documents and generation of images and videos.

Quick Start & Requirements

  • Installation: Clone the repository (git clone).
  • Prerequisites: Python >= 3.10 (recommended 3.10), Conda environment management recommended.
  • Dependencies: pip install -r requirements.txt.
  • Configuration: Copy .env.example to .env and fill in API keys for supported LLMs (OpenAI-compatible, ZhipuAI, Ollama, etc.). Configure config/config-web.yaml.
  • Execution: Run python app.py. Access the UI at http://localhost:7860.
  • Optional: Neo4j database for knowledge graph functionality. Requires downloading and importing a Neo4j dump file.
  • Links: Project Demo Video: https://www.bilibili.com/video/BV1CU2aYpEn2

Highlighted Details

  • Multi-modal Integration: Supports text, voice, and image inputs, with capabilities for image recognition (e.g., medical records), video generation, and document generation (PPT/Word).
  • Enhanced Knowledge Access: Leverages RAG with custom knowledge bases (files), internet search (web crawler), and Neo4j knowledge graphs for context-aware and up-to-date responses.
  • Voice Interaction: Features a dedicated voice dialogue module with STT (Whisper) and TTS (edge-tts) supporting multiple dialects, enabling usage via voice commands.

Maintenance & Community

The project lists several team members and acknowledges reference projects. Specific community channels (like Discord/Slack) or a public roadmap are not explicitly detailed in the README. Contributions via issues and PRs are encouraged for API adaptation and feature improvements.

Licensing & Compatibility

The repository includes a LICENSE file, but its specific terms are not detailed in the provided README content. Compatibility for commercial use or linking with closed-source projects would depend on the exact license terms.

Limitations & Caveats

The setup for the user-specific knowledge base management UI/backend is not well-documented. While the project integrates many features, there is stated room for optimization in individual components, such as more sophisticated knowledge graph entity and relation processing.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
26 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.