Awesome-Biomolecule-Language-Cross-Modeling by QizhiPei

Survey of biomolecule-language cross-modeling

Created 2 years ago

260 stars

Top 97.5% on SourcePulse

Project Summary

This repository serves as a comprehensive, curated collection of resources related to multi-modal learning between biomolecules and natural language, stemming from the survey paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey." It targets researchers, engineers, and practitioners in bioinformatics, computational chemistry, and natural language processing, providing a centralized hub to accelerate research and development in this interdisciplinary field by consolidating models, datasets, and related literature.

How It Works

This project functions as an organized index of academic papers, models, datasets, and other relevant resources. It categorizes these materials into distinct areas: Biomolecule-Language models (BioText, Text + Molecule, Text + Protein), Datasets & Benchmarks, Related Surveys, and Related Repositories. The compilation aims to map the landscape of cross-modal learning, highlighting key advancements and methodologies that bridge the gap between biological entities and textual representations.

Quick Start & Requirements

This repository is a curated list of academic resources and does not contain executable code. Therefore, no installation or specific software requirements are necessary to access or utilize the information provided. The primary resource is the linked survey paper.

Highlighted Details

Extensive cataloging of models across BioText, Text + Molecule, Text + Protein, and Text + BioMulti modalities, featuring prominent examples like BioBERT, MolT5, ProGen, and Galactica.
Comprehensive listing of datasets and benchmarks crucial for training and evaluating biomolecule-language cross-modal models, including PubMed, ZINC, UniProt, and MoleculeNet.
Inclusion of related surveys, evaluations, and repositories to provide broader context and facilitate further exploration within the scientific LLM domain.
The repository is actively maintained, with a note indicating updates to the paper and collection as of December 1, 2025, covering developments since February 2024.

Maintenance & Community

The repository is maintained by Qizhi Pei and Lijun Wu. Contributions are welcomed via pull requests or issues. Direct inquiries can be sent to qizhipei@ruc.edu.cn or lijun_wu@outlook.com.

Licensing & Compatibility

The repository itself does not specify a software license. Users should refer to the individual licenses of the linked papers and resources for their respective terms of use and compatibility.

Limitations & Caveats

As a curated list, this repository is a snapshot of the rapidly evolving field of biomolecule-language cross-modeling. It does not provide direct access to models or datasets but rather links to them. The future update date mentioned (2025.12.01) may indicate a planned update or a temporal anomaly in the provided information.

Awesome-Biomolecule-Language-Cross-Modeling by QizhiPei

Explore Similar Projects

Mol-Instructions by zjunlp

Scientific-LLM-Survey by HICAI-ZJU

torch-molecule by liugangcode

awesome-pretrain-on-molecules by junxia97

awesome-protein-representation-learning by LirongWu

papers-for-molecular-design-using-DL by AspirinCode

OpenBioMed by PharMolix

papers_for_protein_design_using_DL by Peldom

Machine-learning-for-proteins by yangkky

PaddleHelix by PaddlePaddle

DeepPurpose by kexinhuang12345

deeplearning-biology by hussius