awesome-bioie by caufieldjh

Curated list of resources for Biomedical Information Extraction (BioIE)

Created 6 years ago

459 stars

Top 65.1% on SourcePulse

Project Summary

This repository is a curated list of resources for Biomedical Information Extraction (BioIE), targeting researchers and engineers working with unstructured biomedical data. It provides a comprehensive overview of methods, tools, datasets, and organizations in the field, aiming to facilitate the extraction of structured knowledge from complex biological and clinical text.

How It Works

The resource is organized into categories covering research overviews, active groups, journals, conferences, challenges, tutorials, code libraries, tools, annotation platforms, techniques, datasets, and ontologies. It emphasizes publicly accessible, cost-free resources with permissive licenses, reflecting the rapid evolution of BioIE driven by advancements in Large Language Models (LLMs) and BERT-based models.

Quick Start & Requirements

This is a curated list, not a software package. To engage with the resources:

Code Libraries: Many libraries (e.g., spaCy, Biopython, medaCy, ScispaCy) are available via pip.
Datasets: Access often requires registration, data use agreements, or UTS accounts.
Tools: Some tools offer demos (e.g., CLAMP) or require local installation.
Resources: Links to papers, GitHub repos, and official documentation are provided throughout.

Highlighted Details

Extensive coverage of LLMs and BERT variants (BioBERT, ClinicalBERT, SciBERT, PubMedBERT) applied to biomedical tasks.
Detailed lists of annotated datasets for entities, relations (e.g., PPI), and events, including corpora like BC5CDR, CRAFT, and n2c2.
Information on major BioIE research groups, conferences (e.g., ACL BioNLP, BIBM, ISMB), and challenges (e.g., BioASQ, BioCreative).
Includes resources for ontologies and controlled vocabularies like UMLS, Disease Ontology, and RxNorm.

Maintenance & Community

The list is community-driven, encouraging contributions via pull requests. It references active research groups from institutions like Boston Children's Hospital, Mayo Clinic, and NIH/NLM, indicating a vibrant research ecosystem.

Licensing & Compatibility

Resources are preferentially selected for no monetary cost and limited license requirements. However, specific datasets may have usage restrictions or require registration. Compatibility for commercial use depends on the individual resource's license.

Limitations & Caveats

The field is rapidly evolving, particularly with LLMs, meaning some "Pre-LLM Guides" may lack the latest context. Dataset accessibility can vary, with some requiring significant administrative steps or having specific usage terms.

awesome-bioie by caufieldjh

Explore Similar Projects

Awesome-Foundation-Models-for-Advancing-Healthcare by YutingHe-list

Huatuo-26M by FreedomIntelligence

BLUE_Benchmark by ncbi-nlp

awesome-open-data-annotation by zenml-io

s2orc by allenai

bluebert by ncbi-nlp

biobert-pretrained by naver

biomcp by genomoncology

OpenBioMed by PharMolix

MedLLMsPracticalGuide by AI-in-Health

getting-started-with-genomics-tools-and-resources by crazyhottommy

openmed by maziyarpanahi