scGPT by bowang-lab

Foundation model for single-cell multi-omics research

Created 2 years ago

1,433 stars

Top 28.2% on SourcePulse

2 Experts Love This Project

sxyu

Research Scientist at OpenAI; Cofounder of Luma AI

hammer

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

scGPT aims to build a foundation model for single-cell multi-omics analysis using generative AI. It provides pre-trained models and tools for tasks like cell embedding, annotation, and reference mapping, targeting researchers and bioinformaticians working with large-scale single-cell datasets.

How It Works

scGPT leverages a generative transformer architecture, similar to large language models, to learn representations from single-cell data. It processes gene expression profiles as sequences, enabling it to perform various downstream tasks through fine-tuning or zero-shot learning. The model's design allows for efficient handling of large datasets and supports flexible integration with existing bioinformatics tools.

Quick Start & Requirements

Install via pip: pip install scgpt "flash-attn<1.0.5" (or pip install scgpt "flash-attn<1.0.5" "orbax<0.1.8" if encountering orbax issues).
Recommended: Python >= 3.7.13, R >= 3.6.1.
Optional: pip install wandb for logging.
Flash-attention dependency requires specific GPU and CUDA versions (recommend CUDA 11.7 and flash-attn<1.0.5 as of May 2023).
Pre-trained checkpoints are available for download, with whole-human recommended.
Tutorials and online apps are available for reference mapping, cell annotation, and GRN inference.

Highlighted Details

Pre-trained on over 33 million human cells (whole-human model).
Supports zero-shot cell embedding and reference mapping to millions of cells efficiently (e.g., 33M cells index < 1GB, search < 1s on GPU).
Online apps available for browser-based interaction.
Flash-attention is now an optional dependency, allowing CPU loading.

Maintenance & Community

Active development with recent updates (Feb 2024) including preliminary HuggingFace integration.
Tutorials for zero-shot applications and continual pre-trained models are available.
Contributions are welcomed via pull requests.

Licensing & Compatibility

License details are not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not explicitly state the license, which is crucial for commercial adoption.
Flash-attention installation can be complex and requires specific hardware/software configurations.
Some features, like pretraining code with generative attention masking and HuggingFace integration, are still under development or in preliminary stages.

Health Check

Last Commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

3

Star History

32 stars in the last 30 days

Explore Similar Projects

Awesome-Bio-Foundation-Models by apeterswu

Foundation models for biological sequences and structures

Created 1 year ago

Updated 7 months ago

LucaOne by LucaOne

Foundation model for biological sequences

Created 1 year ago

Updated 3 days ago

awesome-foundation-model-single-cell-papers by OmicsML

Collection of papers on foundation models for single-cell omics

Created 2 years ago

Updated 2 days ago

GenePT by yiqunchen

Single-cell foundation model leveraging ChatGPT embeddings for gene/cell biology

Created 2 years ago

Updated 1 year ago

Starred by

Bojan Tunguz

Bojan Tunguz(AI Scientist; Formerly at NVIDIA).

scBERT by TencentAILabHealthcare

Pretrained deep learning model for cell type annotation

Created 4 years ago

Updated 2 years ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

BiomedGPT by taokz

Vision-language foundation model for diverse biomedical tasks

Created 2 years ago

Updated 6 months ago

Starred by

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm),

Phil Wang

Phil Wang(Prolific Research Paper Implementer), and

1 more.

cell2sentence by vandijklab

LLM framework for single-cell transcriptomics

Created 1 year ago

Updated 2 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect).

awesome-deep-learning-single-cell-papers by OmicsML

Curated list of deep learning papers for single-cell analysis

Created 3 years ago

Updated 8 months ago

scRNA-seq_notes by mdozmorov

scRNA-seq tools and papers

Created 7 years ago

Updated 4 days ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

2 more.

evo by evo-design

DNA foundation model for long-context biological sequence modeling and design

Created 1 year ago

Updated 1 month ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect), and

1 more.

deeplearning-biology by hussius

Deep learning implementations in biology

Created 9 years ago

Updated 1 week ago

Starred by

Victor Taelin

Victor Taelin(Author of Bend, Kind, HVM).

awesome-single-cell by seandavi

Curated list of single-cell analysis software packages and data resources

Created 9 years ago

Updated 4 days ago

Feedback? Help us improve.