awesome-bioie  by caufieldjh

Curated list of resources for Biomedical Information Extraction (BioIE)

created 6 years ago
384 stars

Top 75.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated list of resources for Biomedical Information Extraction (BioIE), targeting researchers and engineers working with unstructured biomedical data. It provides a comprehensive overview of methods, tools, datasets, and organizations in the field, aiming to facilitate the extraction of structured knowledge from complex biological and clinical text.

How It Works

The resource is organized into categories covering research overviews, active groups, journals, conferences, challenges, tutorials, code libraries, tools, annotation platforms, techniques, datasets, and ontologies. It emphasizes publicly accessible, cost-free resources with permissive licenses, reflecting the rapid evolution of BioIE driven by advancements in Large Language Models (LLMs) and BERT-based models.

Quick Start & Requirements

This is a curated list, not a software package. To engage with the resources:

  • Code Libraries: Many libraries (e.g., spaCy, Biopython, medaCy, ScispaCy) are available via pip.
  • Datasets: Access often requires registration, data use agreements, or UTS accounts.
  • Tools: Some tools offer demos (e.g., CLAMP) or require local installation.
  • Resources: Links to papers, GitHub repos, and official documentation are provided throughout.

Highlighted Details

  • Extensive coverage of LLMs and BERT variants (BioBERT, ClinicalBERT, SciBERT, PubMedBERT) applied to biomedical tasks.
  • Detailed lists of annotated datasets for entities, relations (e.g., PPI), and events, including corpora like BC5CDR, CRAFT, and n2c2.
  • Information on major BioIE research groups, conferences (e.g., ACL BioNLP, BIBM, ISMB), and challenges (e.g., BioASQ, BioCreative).
  • Includes resources for ontologies and controlled vocabularies like UMLS, Disease Ontology, and RxNorm.

Maintenance & Community

The list is community-driven, encouraging contributions via pull requests. It references active research groups from institutions like Boston Children's Hospital, Mayo Clinic, and NIH/NLM, indicating a vibrant research ecosystem.

Licensing & Compatibility

Resources are preferentially selected for no monetary cost and limited license requirements. However, specific datasets may have usage restrictions or require registration. Compatibility for commercial use depends on the individual resource's license.

Limitations & Caveats

The field is rapidly evolving, particularly with LLMs, meaning some "Pre-LLM Guides" may lack the latest context. Dataset accessibility can vary, with some requiring significant administrative steps or having specific usage terms.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Lysandre Debut Lysandre Debut(Chief Open-Source Officer at Hugging Face), and
4 more.

awesome-nlp by keon

0.1%
17k
Curated list of NLP resources
created 9 years ago
updated 1 year ago
Feedback? Help us improve.