mLLMCelltype by cafferychen777

Framework for single-cell RNA-seq cell type annotation using LLM consensus

Created 9 months ago

623 stars

Top 53.1% on SourcePulse

Project Summary

mLLMCelltype provides an iterative multi-LLM consensus framework for accurate cell type annotation in single-cell RNA sequencing (scRNA-seq) data. It targets bioinformaticians and computational biologists, offering improved annotation accuracy and transparent uncertainty quantification by leveraging the collective intelligence of diverse large language models.

How It Works

The framework employs a multi-LLM consensus architecture, where multiple LLMs analyze gene expression data and marker genes. A structured deliberation process allows these models to share reasoning and refine annotations over several rounds. This collective intelligence approach mitigates individual model biases and errors, leading to more robust and accurate cell type identification.

Quick Start & Requirements

Installation:
- R: devtools::install_github("cafferychen777/mLLMCelltype", subdir = "R")
- Python: pip install mllmcelltype
Prerequisites: API keys for supported LLMs (OpenAI, Anthropic, Google Gemini, Alibaba Qwen, DeepSeek, Zhipu, MiniMax, Stepfun, X.AI Grok, OpenRouter). Python 3.x or R.
Resources: Requires API access to LLMs. Setup involves configuring API keys and potentially installing R/Python packages.
Documentation: mLLMCelltype documentation website

Highlighted Details

Supports a wide range of LLMs including GPT-4o, Claude-3.5, Gemini 2.0, Grok-3, Qwen2.5, and more.
Provides transparent uncertainty quantification via Consensus Proportion and Shannon Entropy metrics.
Seamlessly integrates with Scanpy and Seurat workflows.
Does not require reference datasets for annotation.

Maintenance & Community

Active development with recent updates (v1.1.4 as of April 2025).
Community support via Discord.
Open to contributions for new LLM support, documentation, and features.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing terms before commercial use.

Limitations & Caveats

Requires API keys for most supported LLMs, which may incur costs.
Performance and accuracy are dependent on the quality of input marker genes and the chosen LLMs.
The README does not specify compatibility with older R or Python versions.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

56 stars in the last 30 days

Explore Similar Projects

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Evan Hubinger

Evan Hubinger(Head of Alignment Stress-Testing at Anthropic).

llm-strategy by BlackHC

SDK for connecting Python to LLMs via typed functions

Created 3 years ago

Updated 10 months ago

databonsai by alvin-r

Python library for LLM-powered data cleaning and curation

Created 1 year ago

Updated 1 year ago

ToolkenGPT by Ber666

Research code for augmenting frozen LLMs with tools via embeddings

Created 2 years ago

Updated 1 year ago

universal-ner by universal-ner

NER research paper using LLMs for targeted distillation

Created 2 years ago

Updated 2 years ago

turbo-alignment by turbo-llm

Library for LLM industrial alignment

Created 1 year ago

Updated 3 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

1 more.

HALOs by ContextualAI

Library for aligning LLMs using human-aware loss functions

Created 2 years ago

Updated 3 months ago

ChatDB by huchenxucs

LLM augmented with databases as symbolic memory (research paper)

Created 2 years ago

Updated 2 years ago

awesome-open-data-annotation by zenml-io

Curated list of open-source data annotation/labeling tools

Created 3 years ago

Updated 2 months ago

awesome-llm-and-aigc by coderonion

Curated list of LLM/AIGC open-source projects, datasets, and apps

Created 2 years ago

Updated 5 months ago

DB-GPT by TsinghuaDatabaseGroup

LLM-based system for database diagnosis, research paper & tool APIs

Created 2 years ago

Updated 2 weeks ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

7 more.

autolabel by refuel-ai

Python library to label text datasets using LLMs

Created 2 years ago

Updated 10 months ago

Starred by

Leandro von Werra

Leandro von Werra(Head of Research at Hugging Face),

Jesse Clark

Jesse Clark(Cofounder of Marqo), and

15 more.

distilabel by argilla-io

Framework for synthetic data and AI feedback pipelines

Created 2 years ago

Updated 2 weeks ago

Feedback? Help us improve.