BRAKER  by Gaius-Augustus

Pipeline for automated protein-coding gene prediction in eukaryotic genomes

created 7 years ago
418 stars

Top 71.2% on sourcepulse

GitHubView on GitHub
Project Summary

BRAKER is a comprehensive pipeline for automated eukaryotic gene structure prediction, designed for researchers and bioinformaticians working with novel genomes. It integrates multiple evidence types (RNA-Seq, protein homology) and leverages advanced gene predictors like GeneMark-ETP and AUGUSTUS to deliver highly accurate gene annotations.

How It Works

BRAKER operates by semi-supervised training of GeneMark-ETP and AUGUSTUS, incorporating extrinsic evidence from RNA-Seq and/or protein alignments. It can perform ab initio predictions if no external data is available. The pipeline intelligently combines predictions from both gene finders using TSEBRA, aiming for high accuracy even without closely related annotated species or RNA-Seq data.

Quick Start & Requirements

  • Installation: Primarily via manual installation of dependencies or using a provided Docker container (Singularity recommended).
  • Prerequisites: Linux environment, Perl, Python 3, and numerous bioinformatics tools (e.g., GeneMark-ETP, AUGUSTUS, SAMtools, BLAST+/DIAMOND, HISAT2). Specific dependencies vary by BRAKER version and input data.
  • Setup: Manual installation can be time-consuming due to the extensive dependency list. The Docker container simplifies deployment.
  • Documentation: Comprehensive user guide available within the repository.

Highlighted Details

  • Supports multiple input modes: RNA-Seq only, protein data only, or combined RNA-Seq and protein data.
  • Offers experimental support for UTR prediction and integration of long-read RNA-Seq data.
  • Includes options for generating UCSC Genome Browser track hubs.
  • Can utilize BUSCO lineage information to refine gene predictions.

Maintenance & Community

The project is actively maintained by a core team from the University of Greifswald and Georgia Tech, with contributions from a wider scientific community. Bug reporting and discussions are managed via GitHub issues. Contact information for key developers is provided.

Licensing & Compatibility

The BRAKER pipeline scripts are licensed under the Artistic License. However, users must also comply with the licenses of the underlying tools (GeneMark, AUGUSTUS, etc.), which may have different terms. Commercial use compatibility depends on the licenses of all integrated components.

Limitations & Caveats

  • Manual installation of dependencies is complex and error-prone.
  • UTR prediction and long-read RNA-Seq integration are experimental and may be unstable.
  • Compatibility issues have been reported with certain package manager versions (e.g., Bioconda) of AUGUSTUS and GeneMark; using the latest GitHub sources is recommended.
  • The accuracy of predictions is highly dependent on the quality of the input genome assembly and evidence data.
Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
21 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.