scBERT  by TencentAILabHealthcare

Pretrained deep learning model for cell type annotation

created 3 years ago
321 stars

Top 85.7% on sourcepulse

GitHubView on GitHub
Project Summary

scBERT is a deep learning model designed for cell type annotation of single-cell RNA sequencing (scRNA-seq) data. It addresses limitations in existing methods by leveraging a pre-trained Transformer architecture to understand gene-gene interactions, handle batch effects, and utilize latent information, benefiting researchers in genomics and bioinformatics.

How It Works

scBERT employs a pre-train and fine-tune paradigm. It is first pre-trained on large unlabeled scRNA-seq datasets to learn general gene-gene interaction patterns. This pre-trained model is then fine-tuned on user-specific data for supervised cell annotation tasks. The architecture is based on the Performer encoder, a variant of Transformers, which is advantageous for its ability to capture complex relationships within the high-dimensional scRNA-seq data.

Quick Start & Requirements

  • Install: Clone the repository and use Python.
  • Prerequisites: Python, PyTorch, scanpy. Data preprocessing requires NCBI Gene database (Jan 10, 2020) and specific scanpy normalization (sc.pp.normalize_total, sc.pp.log1p).
  • Setup Time: Approximately 30 minutes for typical installation on a desktop.
  • Resources: Inference for 10,000 cells takes about 25 minutes on a desktop.
  • Links: Pre-trained model checkpoint and data can be downloaded via provided links. Preprocessing details are in preprocess.py.

Highlighted Details

  • Utilizes a Performer encoder for efficient Transformer processing.
  • Pre-trained on large-scale unlabeled scRNA-seq data.
  • Fine-tunable for user-specific cell annotation tasks.
  • Supports detection of novel cell types via probability thresholding.

Maintenance & Community

Developed by Tencent AI Lab. The project is associated with a publication in Nature Machine Intelligence. Contact email fionafyang@tencent.com is provided for questions.

Licensing & Compatibility

All rights reserved by Tencent AI Lab. The tool is for research purposes only and not approved for clinical use.

Limitations & Caveats

The tool is explicitly stated as being for research purposes and not for clinical use. Gene symbols must be revised according to a specific NCBI Gene database version (Jan 10, 2020), and data requires specific normalization steps using scanpy.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.