Cell type classifier for scRNA-seq datasets using logistic regression
Top 74.2% on sourcepulse
CellTypist is a Python package for semi-automatic cell type annotation of single-cell RNA sequencing (scRNA-seq) data. It leverages logistic regression classifiers trained on reference datasets to predict cell identities in query datasets, offering both automated and multi-label classification modes. The tool is designed for researchers and bioinformaticians working with scRNA-seq data who need to annotate cell populations efficiently and accurately.
How It Works
CellTypist employs logistic regression models, optimized via stochastic gradient descent, to classify cells. Users can utilize pre-built models (e.g., for immune cell subtypes) or train custom models from their own reference data. The core functionality involves taking a gene expression count matrix (cell-by-gene or gene-by-cell) or an AnnData object as input and returning predicted cell type labels, decision scores, and probabilities. An optional majority voting mechanism can refine predictions by considering cell-cell transcriptomic relationships within clusters.
Quick Start & Requirements
pip install celltypist
or conda install -c bioconda -c conda-forge celltypist
Highlighted Details
Maintenance & Community
The project is associated with Teichlab. Further community engagement details are not explicitly listed in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility with commercial or closed-source projects is not specified.
Limitations & Caveats
The README notes that for subsetting models, retraining the original reference data is a more accurate approach than using the subset
method. Cross-species conversion relies on ortholog mapping files, and the default uses Ensembl version 105. The tool is primarily Python-based, with no direct R compatibility mentioned.
1 month ago
1 week