Discover and explore top open-source AI tools and projects—updated daily.
walajLocal assembly-based caller for structural variations and indels
Top 98.7% on SourcePulse
Summary
SvABA is a short-read structural variation (SV) and indel caller that leverages local genome assembly for precise variant detection. It targets researchers and bioinformaticians analyzing germline or somatic samples, offering detailed evidence for variant calls and supporting tumor/normal, trio, and single-sample analyses.
How It Works
The core methodology involves performing local genome assembly on candidate regions using either Fermi-lite or SGA, followed by realignment of assembled contigs with BWA-MEM. Variants are then scored based on reassembled read support, providing robust evidence for SVs and indels. This approach enables high-resolution detection by reconstructing the genomic context around variations.
Quick Start & Requirements
Installation requires cloning the repository and building with CMake. Key dependencies include CMake and an external htslib installation. The build defaults to RelWithDebInfo optimization; performance can be boosted by manually enabling -O3 -mcpu=native for vendored components. SvABA supports both Fermi-lite (default, faster) and SGA assemblers, selectable at compile time. A typical workflow involves running svaba run, post-processing with scripts/svaba_postprocess.sh, and converting results to VCF using svaba tovcf. Official documentation is available via CLAUDE.md and interactive HTML viewers are included.
Highlighted Details
bps.txt.gz file containing per-sample evidence.tracks/hg38.combined_blacklist.bed) to improve runtime and reduce false positives in complex genomic regions.svaba refilter command for post-hoc tuning of variant scoring thresholds without re-running the primary analysis pipeline.Maintenance & Community
Developed by Jeremiah Wala (Dana-Farber Cancer Institute) and collaborators at the Broad Institute. Bug reports, feature requests, and questions are managed via the GitHub issues tracker. The project notes the use of AI tools (OpenAI Codex, Anthropic Claude) in its development and documentation. No community chat links (e.g., Slack, Discord) are provided.
Licensing & Compatibility
Licensed under GNU GPLv3. This is a strong copyleft license, requiring derivative works to also be licensed under GPLv3. Compatibility for commercial use or integration into closed-source projects should be carefully evaluated due to these restrictions.
Limitations & Caveats
The build process requires manual configuration of htslib path if not system-wide. Optimization levels for vendored assemblers are hardcoded to -O2 by default, requiring manual recompilation for potential performance gains. The --dump-reads option generates extremely large output files, suitable only for deep debugging. Specific tuning is provided for germline analysis, implying potential considerations for somatic workflows.
1 week ago
Inactive
magpie-align
MAGICS-LAB
evo-design
lucidrains