omicverse  by Starlitnightly

Python library for multi-omics analysis

Created 4 years ago
740 stars

Top 46.8% on SourcePulse

GitHubView on GitHub
Project Summary

OmicVerse is a Python library designed for comprehensive multi-omics analysis, specifically targeting bulk, single-cell, and spatial RNA sequencing data. It aims to provide a unified framework for researchers to integrate and analyze diverse transcriptomic datasets, facilitating deeper insights across different biological contexts. The library is particularly beneficial for bioinformaticians and computational biologists working with complex RNA-seq data.

How It Works

OmicVerse is built upon a data framework leveraging pandas, anndata, numpy, and muData. A key algorithmic contribution is the BulkTrajBlend algorithm, which combines Beta-Variational Autoencoders for deconvolution with graph neural networks for community discovery. This approach is designed to interpolate and restore continuity in single-cell RNA-seq data, addressing "omission" cells and improving data completeness. The library also includes a research submodule (omicverse.llm.dr) that utilizes large language models for automated report generation from user queries, including web search retrieval and synthesis.

Quick Start & Requirements

Installation can be done via conda (conda install omicverse -c conda-forge) or pip (pip install -U omicverse). PyTorch must be installed first. Additional dependencies may include scanpy, tdigest, peft, datasets, accelerate, chromadb, and langchain_community, depending on the chosen LLM or vector store functionalities. Detailed installation guides for Windows, Linux, and macOS are available.

Highlighted Details

  • Integrates a wide array of published single-cell analysis tools, including Scanpy, MOFA, CellphoneDB, scVI, and Tangram.
  • Features an LLM-backed domain research module for automated report generation and synthesis, with support for live web search retrieval via Tavily or DuckDuckGo.
  • The BulkTrajBlend algorithm offers a novel approach to data imputation and continuity restoration in scRNA-seq data.

Maintenance & Community

The project is actively maintained, with a primary contact listed as Zehua Zeng. Contributing guidelines are available, and the project is promoted via WeChat Official Accounts.

Licensing & Compatibility

OmicVerse is licensed under GPL-3.0. This license is copyleft, meaning derivative works must also be licensed under GPL-3.0, which may have implications for integration into closed-source commercial products.

Limitations & Caveats

The GPL-3.0 license may impose restrictions on commercial use or integration into proprietary software due to its strong copyleft provisions. Some advanced LLM features require API keys (e.g., OpenAI, Tavily) and additional dependencies.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
50
Issues (30d)
56
Star History
31 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.