Single-cell foundation model leveraging ChatGPT embeddings for gene/cell biology
Top 95.2% on sourcepulse
GenePT is a foundation model for single-cell biology, offering a user-friendly and efficient approach to gene-level and cell-level tasks by leveraging ChatGPT embeddings of NCBI gene descriptions. It is designed for researchers and bioinformaticians working with single-cell RNA sequencing data who seek to bypass extensive data curation and computationally intensive training of traditional foundation models.
How It Works
GenePT utilizes pre-trained embeddings from OpenAI's GPT-3.5 (specifically text-embedding-ada-002
and text-embedding-3-large
) applied to NCBI text descriptions of individual genes. Gene embeddings are generated from these descriptions. For cell-level analysis, GenePT creates cell embeddings by either averaging gene embeddings weighted by expression levels or by generating sentence embeddings from gene names ordered by expression. This method avoids the need for dataset curation and additional pre-training, making it efficient and accessible.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project is associated with authors from academic institutions. Further community engagement details (e.g., Discord, Slack) are not explicitly mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. The project's code and data usage should be reviewed for compatibility with commercial or closed-source applications.
Limitations & Caveats
The project relies on external OpenAI API services, which may incur costs and are subject to OpenAI's terms of service. Specific datasets used in the paper may require separate download and processing.
1 year ago
1 day