Discover and explore top open-source AI tools and projects—updated daily.
HKU-BALDeep learning variant caller for long-read sequencing
Top 80.4% on SourcePulse
Summary
Clair3 is a deep learning-based variant caller designed for long-read sequencing data. It addresses the challenge of accurately identifying germline small variants by harmonizing two distinct calling strategies: a fast pileup-based approach for broad candidate identification and a precise full-alignment model for complex cases. This dual-model architecture offers superior performance, particularly at lower sequencing coverages, making it a valuable tool for researchers and bioinformaticians seeking high recall and precision in variant detection.
How It Works
Clair3 employs a novel architecture that integrates both pileup and full-alignment deep learning models. The pileup model efficiently processes summarized alignment statistics to identify a majority of variant candidates. For candidates requiring higher confidence or exhibiting complexity, a computationally intensive, haplotype-resolved full-alignment model is applied. This synergistic approach balances computational speed with maximal precision and recall, outperforming previous generations and offering significant error reduction.
Quick Start & Requirements
Installation can be achieved via Mamba/Conda (recommended for GPU/Apple Silicon acceleration), Docker (CPU only), or Singularity (CPU only). Key dependencies include Python 3.11, Samtools (>=1.10), PyTorch, and potentially CUDA for GPU acceleration. Pre-trained PyTorch models for various platforms (ONT, PacBio HiFi, Illumina) are available for download. GPU acceleration offers approximately a 5x speedup over CPU.
Highlighted Details
Maintenance & Community
The project lists contact emails for Ruibang Luo, Zhenxian Zheng, and Xian Yu. Several recent updates highlight contributions from various individuals (e.g., @Devon Ryan, @Sam Nicholls, @William Shropshire), indicating active development and community involvement in bug fixes and feature enhancements. No specific community channels (like Slack or Discord) are listed.
Licensing & Compatibility
The license type is not explicitly stated in the provided README content. This omission requires further investigation for users considering commercial use or integration into closed-source projects.
Limitations & Caveats
Docker and Singularity images are limited to CPU execution; GPU or Apple Silicon acceleration necessitates a Mamba/Conda installation. TensorFlow models from Clair3 v1 are incompatible with the PyTorch-based v2.0. The --enable_variant_calling_at_sequence_head_and_tail option, while useful for amplicon data, should be used cautiously due to potentially less reliable alignments in those regions.
2 weeks ago
Inactive
NVIDIA
evo-design
hussius