Medical Q\&A with deep language models
Top 57.3% on sourcepulse
This project provides a medical question-answering system that leverages deep learning models like BERT and GPT-2 to retrieve and generate answers from a large corpus of medical data. It is targeted at researchers and developers interested in exploring advanced NLP techniques for specialized domains, offering a novel approach to medical information retrieval.
How It Works
The system employs a dual-model architecture. First, a fine-tuned BioBERT model encodes medical questions and answers into vector representations. These embeddings are then processed by separate Feed-Forward Neural Networks (FCNNs) for questions and answers, mapping them into a metric space. Similarity is calculated using a custom cross-entropy loss that treats other answers in a batch as negative samples, encouraging closer embeddings for relevant question-answer pairs. Finally, a fine-tuned GPT-2 model generates an answer based on the question and the top-k retrieved relevant medical information.
Quick Start & Requirements
pip install tensorflow-gpu==2.0.0-alpha0
, pip install mkl
, pip install https://github.com/Santosh-Gupta/DocProduct/archive/master.zip
. FAISS CPU/GPU installation requires manual download and compilation.Highlighted Details
tf.data
and TFRecords for efficient preprocessing of large datasets.Maintenance & Community
The project was a finalist in a TensorFlow challenge and presented to the TensorFlow Engineering Team. Collaboration is welcomed via email at Research2Vec@gmail.com.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is explicitly stated as not being for actionable medical advice and is not ready for widespread commercial use. The end-to-end demo is experimental. The installation instructions for FAISS are complex and require manual steps.
2 years ago
Inactive