Awesome-Scientific-Datasets-and-LLMs  by open-sciencelab

Guide to scientific datasets and LLMs

Created 4 weeks ago

New!

346 stars

Top 80.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This repository offers a meticulously curated collection of papers, datasets, and models focused on Scientific Large Language Models (Sci-LLMs). It serves as a comprehensive resource, structured around a survey paper, for researchers and practitioners seeking to navigate the rapidly advancing landscape of LLMs in scientific domains. The primary benefit is a centralized, categorized overview of the field, facilitating discovery and adoption of relevant resources across diverse scientific disciplines.

How It Works

The project systematically organizes a vast array of scientific datasets and LLM resources, categorized by domain (e.g., Life Sciences, Chemistry, Physics, Astronomy, Materials Science, Earth Science, General Science). It also presents key trends, historical development paradigms, and timelines illustrating the evolution of Sci-LLMs, providing a structured map of the field's progress.

Quick Start & Requirements

This repository functions as a curated knowledge base. There are no installation or execution requirements; users can directly browse the categorized lists of datasets, papers, and models.

Highlighted Details

  • Features an extensive catalog of datasets and models across more than 10 scientific domains, each with detailed metadata and release dates.
  • Includes analyses of publication trends, evolution of Sci-LLM paradigms, and chronological overviews of key developments.
  • Highlights a significant number of recent datasets and models, with many entries dated from 2024 to 2025.

Maintenance & Community

Contributions and suggestions are welcomed via email (huming@pjlab.org.cn, clma24@m.fudan.edu.cn, litianbin@pjlab.org.cn). Citation details for the associated survey paper are provided. No dedicated community channels (e.g., Discord, Slack) are listed.

Licensing & Compatibility

The provided README content does not specify a license for the repository's curated resources or the collection itself.

Limitations & Caveats

As a curated list, its comprehensiveness is subject to the pace of updates in the rapidly evolving Sci-LLM field. While extensive, it may not encompass every nascent dataset or model.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
347 stars in the last 29 days

Explore Similar Projects

Feedback? Help us improve.