celltypist  by Teichlab

Cell type classifier for scRNA-seq datasets using logistic regression

Created 4 years ago
404 stars

Top 71.9% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

CellTypist is a Python package for semi-automatic cell type annotation of single-cell RNA sequencing (scRNA-seq) data. It leverages logistic regression classifiers trained on reference datasets to predict cell identities in query datasets, offering both automated and multi-label classification modes. The tool is designed for researchers and bioinformaticians working with scRNA-seq data who need to annotate cell populations efficiently and accurately.

How It Works

CellTypist employs logistic regression models, optimized via stochastic gradient descent, to classify cells. Users can utilize pre-built models (e.g., for immune cell subtypes) or train custom models from their own reference data. The core functionality involves taking a gene expression count matrix (cell-by-gene or gene-by-cell) or an AnnData object as input and returning predicted cell type labels, decision scores, and probabilities. An optional majority voting mechanism can refine predictions by considering cell-cell transcriptomic relationships within clusters.

Quick Start & Requirements

  • Install: pip install celltypist or conda install -c bioconda -c conda-forge celltypist
  • Prerequisites: Python 3.x. Models are downloaded on demand and are typically ~1MB each.
  • Usage: Detailed examples for Python API and command-line interface are provided. See CellTypist website for interactive tutorials.

Highlighted Details

  • Supports classification using pre-built or custom-trained models.
  • Offers 'best match' and 'prob match' modes for single or multi-label classification.
  • Includes a majority voting classifier to leverage cell-cell transcriptomic similarity.
  • Provides functionality for creating custom models and cross-species/gene ID conversion.

Maintenance & Community

The project is associated with Teichlab. Further community engagement details are not explicitly listed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The README notes that for subsetting models, retraining the original reference data is a more accurate approach than using the subset method. Cross-species conversion relies on ortholog mapping files, and the default uses Ensembl version 105. The tool is primarily Python-based, with no direct R compatibility mentioned.

Health Check
Last Commit

12 hours ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Alex Atallah Alex Atallah(Cofounder of OpenRouter), and
8 more.

cleanlab by cleanlab

0.2%
11k
Data-centric AI package for ML with messy data
Created 7 years ago
Updated 1 week ago
Feedback? Help us improve.