DocProduct  by re-search

Medical Q\&A with deep language models

created 6 years ago
571 stars

Top 57.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a medical question-answering system that leverages deep learning models like BERT and GPT-2 to retrieve and generate answers from a large corpus of medical data. It is targeted at researchers and developers interested in exploring advanced NLP techniques for specialized domains, offering a novel approach to medical information retrieval.

How It Works

The system employs a dual-model architecture. First, a fine-tuned BioBERT model encodes medical questions and answers into vector representations. These embeddings are then processed by separate Feed-Forward Neural Networks (FCNNs) for questions and answers, mapping them into a metric space. Similarity is calculated using a custom cross-entropy loss that treats other answers in a batch as negative samples, encouraging closer embeddings for relevant question-answer pairs. Finally, a fine-tuned GPT-2 model generates an answer based on the question and the top-k retrieved relevant medical information.

Quick Start & Requirements

Highlighted Details

  • Winner of the #PoweredByTF 2.0 Challenge (Top 6 Finalist).
  • Utilizes a custom loss function inspired by negative sampling for embedding similarity training.
  • Optimized input pipeline using tf.data and TFRecords for efficient preprocessing of large datasets.
  • Developed an imperative BERT implementation for better debugging and compatibility with TensorFlow 2.0 eager execution.

Maintenance & Community

The project was a finalist in a TensorFlow challenge and presented to the TensorFlow Engineering Team. Collaboration is welcomed via email at Research2Vec@gmail.com.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is explicitly stated as not being for actionable medical advice and is not ready for widespread commercial use. The end-to-end demo is experimental. The installation instructions for FAISS are complex and require manual steps.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

BioGPT by microsoft

0.1%
4k
BioGPT is a generative pre-trained transformer for biomedical text
created 3 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.