Awesome-Bio-Foundation-Models  by apeterswu

Foundation models for biological sequences and structures

Created 1 year ago
256 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated collection of foundational models and research papers focused on biological sequences and structures, including DNA, RNA, proteins, and single-cell data. It serves as a valuable resource for researchers and practitioners in bioinformatics and computational biology looking to leverage large language models (LLMs) and deep learning for biological discovery. The primary benefit is a centralized, organized overview of the rapidly evolving field of bio-LLMs, facilitating easier access to relevant models, papers, and tools.

How It Works

The collection categorizes foundational models and papers across various biological domains. It highlights approaches that utilize transformer architectures and self-supervised learning to process and understand biological sequences, akin to natural language processing. This allows models to learn underlying patterns, predict functions, and even generate novel sequences, offering a powerful new paradigm for biological research.

Quick Start & Requirements

This is a curated list of resources, not a runnable software package. Users will need to refer to individual papers or repositories for installation and execution instructions. Requirements will vary significantly depending on the specific model or tool being explored, potentially including Python, deep learning frameworks (like PyTorch or TensorFlow), specific libraries, and potentially GPU acceleration for training or inference. Links to official documentation, demos, or code repositories are often provided within the listed papers or related resources.

Highlighted Details

  • Comprehensive coverage across DNA, RNA, protein, and single-cell domains.
  • Includes models and papers focusing on structure prediction, function prediction, and generative design.
  • Features links to related surveys and repositories for broader exploration.
  • Papers are generally ranked chronologically within each category.

Maintenance & Community

The repository encourages community contributions through pull requests and issues, indicating an active effort to keep the collection updated. It links to related repositories, suggesting a broader ecosystem of resources and potential collaborators.

Licensing & Compatibility

Licensing information is not provided at the repository level. Users must consult the individual licenses of the papers and associated code repositories for details on usage, distribution, and compatibility, especially for commercial applications.

Limitations & Caveats

As a curated list, the repository itself does not offer direct functionality. The rapid pace of research means that some listed models or papers may become outdated or superseded quickly. Users need to independently evaluate the maturity, performance, and applicability of each resource.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0.3%
1k
DNA foundation model for long-context biological sequence modeling and design
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.