mLLMCelltype  by cafferychen777

Framework for single-cell RNA-seq cell type annotation using LLM consensus

created 3 months ago
545 stars

Top 59.3% on sourcepulse

GitHubView on GitHub
Project Summary

mLLMCelltype provides an iterative multi-LLM consensus framework for accurate cell type annotation in single-cell RNA sequencing (scRNA-seq) data. It targets bioinformaticians and computational biologists, offering improved annotation accuracy and transparent uncertainty quantification by leveraging the collective intelligence of diverse large language models.

How It Works

The framework employs a multi-LLM consensus architecture, where multiple LLMs analyze gene expression data and marker genes. A structured deliberation process allows these models to share reasoning and refine annotations over several rounds. This collective intelligence approach mitigates individual model biases and errors, leading to more robust and accurate cell type identification.

Quick Start & Requirements

  • Installation:
    • R: devtools::install_github("cafferychen777/mLLMCelltype", subdir = "R")
    • Python: pip install mllmcelltype
  • Prerequisites: API keys for supported LLMs (OpenAI, Anthropic, Google Gemini, Alibaba Qwen, DeepSeek, Zhipu, MiniMax, Stepfun, X.AI Grok, OpenRouter). Python 3.x or R.
  • Resources: Requires API access to LLMs. Setup involves configuring API keys and potentially installing R/Python packages.
  • Documentation: mLLMCelltype documentation website

Highlighted Details

  • Supports a wide range of LLMs including GPT-4o, Claude-3.5, Gemini 2.0, Grok-3, Qwen2.5, and more.
  • Provides transparent uncertainty quantification via Consensus Proportion and Shannon Entropy metrics.
  • Seamlessly integrates with Scanpy and Seurat workflows.
  • Does not require reference datasets for annotation.

Maintenance & Community

  • Active development with recent updates (v1.1.4 as of April 2025).
  • Community support via Discord.
  • Open to contributions for new LLM support, documentation, and features.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use.

Limitations & Caveats

  • Requires API keys for most supported LLMs, which may incur costs.
  • Performance and accuracy are dependent on the quality of input marker genes and the chosen LLMs.
  • The README does not specify compatibility with older R or Python versions.
Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
220 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

autolabel by refuel-ai

0.3%
2k
Python library to label text datasets using LLMs
created 2 years ago
updated 5 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind).

scikit-llm by BeastByteAI

0.1%
3k
SDK for integrating LLMs into scikit-learn pipelines
created 2 years ago
updated 2 days ago
Feedback? Help us improve.