svaba  by walaj

Local assembly-based caller for structural variations and indels

Created 9 years ago
255 stars

Top 98.7% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary

SvABA is a short-read structural variation (SV) and indel caller that leverages local genome assembly for precise variant detection. It targets researchers and bioinformaticians analyzing germline or somatic samples, offering detailed evidence for variant calls and supporting tumor/normal, trio, and single-sample analyses.

How It Works

The core methodology involves performing local genome assembly on candidate regions using either Fermi-lite or SGA, followed by realignment of assembled contigs with BWA-MEM. Variants are then scored based on reassembled read support, providing robust evidence for SVs and indels. This approach enables high-resolution detection by reconstructing the genomic context around variations.

Quick Start & Requirements

Installation requires cloning the repository and building with CMake. Key dependencies include CMake and an external htslib installation. The build defaults to RelWithDebInfo optimization; performance can be boosted by manually enabling -O3 -mcpu=native for vendored components. SvABA supports both Fermi-lite (default, faster) and SGA assemblers, selectable at compile time. A typical workflow involves running svaba run, post-processing with scripts/svaba_postprocess.sh, and converting results to VCF using svaba tovcf. Official documentation is available via CLAUDE.md and interactive HTML viewers are included.

Highlighted Details

  • Supports tumor/normal, trios, and single-sample analysis modes.
  • Outputs variants as VCF alongside a detailed bps.txt.gz file containing per-sample evidence.
  • Includes a bundled, curated blacklist (tracks/hg38.combined_blacklist.bed) to improve runtime and reduce false positives in complex genomic regions.
  • Provides interactive, browser-based HTML viewers for exploring assembled contigs, read alignments, and runtime statistics.
  • Features a svaba refilter command for post-hoc tuning of variant scoring thresholds without re-running the primary analysis pipeline.
  • Enables targeted assembly over specified regions using BED files or coordinate strings.

Maintenance & Community

Developed by Jeremiah Wala (Dana-Farber Cancer Institute) and collaborators at the Broad Institute. Bug reports, feature requests, and questions are managed via the GitHub issues tracker. The project notes the use of AI tools (OpenAI Codex, Anthropic Claude) in its development and documentation. No community chat links (e.g., Slack, Discord) are provided.

Licensing & Compatibility

Licensed under GNU GPLv3. This is a strong copyleft license, requiring derivative works to also be licensed under GPLv3. Compatibility for commercial use or integration into closed-source projects should be carefully evaluated due to these restrictions.

Limitations & Caveats

The build process requires manual configuration of htslib path if not system-wide. Optimization levels for vendored assemblers are hardcoded to -O2 by default, requiring manual recompilation for potential performance gains. The --dump-reads option generates extremely large output files, suitable only for deep debugging. Specific tuning is provided for germline analysis, implying potential considerations for somatic workflows.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0%
2k
DNA foundation model for long-context biological sequence modeling and design
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.