Genomics foundation models & segmentation tools
Top 51.2% on sourcepulse
This repository provides foundation models for genomics and transcriptomics, including the Nucleotide Transformer (NT) and Agro Nucleotide Transformer (AgroNT) for genomic language modeling, and SegmentNT, SegmentEnformer, and SegmentBorzoi for single-nucleotide resolution genomic element segmentation. It targets researchers and practitioners in bioinformatics and computational biology, offering pre-trained weights and inference code to accelerate genomic analysis and discovery.
How It Works
The core of the project utilizes transformer architectures adapted for DNA sequences. Nucleotide Transformers process DNA by tokenizing sequences into 6-mers, leveraging large-scale pre-training on diverse human and multi-species genomes. SegmentNT models build upon these transformers by replacing the language model head with a U-Net segmentation head, enabling precise localization of genomic features. Nucleotide Transformer v2 models incorporate architectural improvements like Rotary Embeddings and Gated Linear Units for enhanced efficiency and longer context windows.
Quick Start & Requirements
pip install .
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 weeks ago
1+ week