Awesome-Biomolecule-Language-Cross-Modeling  by QizhiPei

Survey of biomolecule-language cross-modeling

Created 2 years ago
253 stars

Top 99.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a comprehensive, curated collection of resources related to multi-modal learning between biomolecules and natural language, stemming from the survey paper "Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey." It targets researchers, engineers, and practitioners in bioinformatics, computational chemistry, and natural language processing, providing a centralized hub to accelerate research and development in this interdisciplinary field by consolidating models, datasets, and related literature.

How It Works

This project functions as an organized index of academic papers, models, datasets, and other relevant resources. It categorizes these materials into distinct areas: Biomolecule-Language models (BioText, Text + Molecule, Text + Protein), Datasets & Benchmarks, Related Surveys, and Related Repositories. The compilation aims to map the landscape of cross-modal learning, highlighting key advancements and methodologies that bridge the gap between biological entities and textual representations.

Quick Start & Requirements

This repository is a curated list of academic resources and does not contain executable code. Therefore, no installation or specific software requirements are necessary to access or utilize the information provided. The primary resource is the linked survey paper.

Highlighted Details

  • Extensive cataloging of models across BioText, Text + Molecule, Text + Protein, and Text + BioMulti modalities, featuring prominent examples like BioBERT, MolT5, ProGen, and Galactica.
  • Comprehensive listing of datasets and benchmarks crucial for training and evaluating biomolecule-language cross-modal models, including PubMed, ZINC, UniProt, and MoleculeNet.
  • Inclusion of related surveys, evaluations, and repositories to provide broader context and facilitate further exploration within the scientific LLM domain.
  • The repository is actively maintained, with a note indicating updates to the paper and collection as of December 1, 2025, covering developments since February 2024.

Maintenance & Community

The repository is maintained by Qizhi Pei and Lijun Wu. Contributions are welcomed via pull requests or issues. Direct inquiries can be sent to qizhipei@ruc.edu.cn or lijun_wu@outlook.com.

Licensing & Compatibility

The repository itself does not specify a software license. Users should refer to the individual licenses of the linked papers and resources for their respective terms of use and compatibility.

Limitations & Caveats

As a curated list, this repository is a snapshot of the rapidly evolving field of biomolecule-language cross-modeling. It does not provide direct access to models or datasets but rather links to them. The future update date mentioned (2025.12.01) may indicate a planned update or a temporal anomaly in the provided information.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.