Scientific-LLM-Survey by HICAI-ZJU

Survey of scientific LLMs, focusing on biology and chemistry

Created 2 years ago

352 stars

Top 79.5% on SourcePulse

Project Summary

This repository serves as a comprehensive survey of Scientific Large Language Models (Sci-LLMs), focusing on their applications in biology and chemistry. It aims to consolidate research, datasets, and benchmarks for researchers and practitioners in these specialized AI domains, providing a structured overview of the rapidly evolving field.

How It Works

The survey categorizes Sci-LLMs based on their primary data modalities and application areas: Textual (medical, biology, chemistry), Molecular (property prediction, generation, reaction prediction), Protein, Genomic, and Multimodal (combining different data types). It meticulously lists relevant papers, datasets, and benchmarks within each category, offering a structured landscape of the Sci-LLM ecosystem.

Quick Start & Requirements

This repository is a curated survey and does not have direct installation or execution requirements. It provides links to papers, code, and datasets for further exploration.

Highlighted Details

Extensive coverage of LLMs across biological and chemical domains, including specialized areas like molecular, protein, and genomic data.
Detailed listing of relevant datasets and benchmarks for evaluating Sci-LLM performance.
Categorization of models based on their input modalities (text, molecule, protein, genome) and tasks.
Regular updates to incorporate the latest research, with the latest version available on arXiv.

Maintenance & Community

The project is maintained by HICAI-ZJU and lists several contributors. Users are encouraged to recommend missing papers via issues or pull requests. Contact information for Xinda Wang is provided.

Licensing & Compatibility

The repository itself is a survey and does not impose licensing restrictions. Individual papers and code linked within the survey will have their own respective licenses.

Limitations & Caveats

As a survey, this repository does not provide executable code or models. Its value is in its comprehensive cataloging of existing research, requiring users to independently access and evaluate the linked resources.

Scientific-LLM-Survey by HICAI-ZJU

Explore Similar Projects

OmniGenBench by COLA-Laboratory

Mol-Instructions by zjunlp

Awesome-Scientific-Datasets-and-LLMs by InternScience

awesome-matchem-datasets by blaiszik

Awesome-Scientific-Language-Models by yuzhimanhua

OpenBioMed by PharMolix

cell2sentence by vandijklab

papers_for_protein_design_using_DL by Peldom

Machine-learning-for-proteins by yangkky

PaddleHelix by PaddlePaddle

getting-started-with-genomics-tools-and-resources by crazyhottommy

deeplearning-biology by hussius