DNA foundation model for long-context biological sequence modeling and design
Top 29.6% on sourcepulse
Evo is a biological foundation model designed for long-context sequence modeling and design, spanning from molecular to genome scales. It targets researchers and developers in bioinformatics and synthetic biology, offering capabilities for understanding and generating DNA sequences with unprecedented context lengths.
How It Works
Evo utilizes the StripedHyena architecture, enabling byte-level resolution modeling of DNA sequences with near-linear scaling of compute and memory relative to context length. This approach allows for efficient processing of extremely long sequences, a significant advantage over traditional transformer models that suffer from quadratic complexity. The model is trained on OpenGenome, a large prokaryotic whole-genome dataset.
Quick Start & Requirements
pip install evo-model
or from source (git clone
then pip install .
).prodigal
for specific scripts.Highlighted Details
Maintenance & Community
The project is associated with the Arc Institute and has published in Science. The README mentions a recent bug fix for inference affecting specific release versions. Further details on Evo 2 are available at https://github.com/arcinstitute/evo2.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
FlashAttention-2, a key dependency, may not be compatible with all GPU architectures. Users must verify compatibility before installation. The project also points to a separate repository for Evo 2, suggesting ongoing development and potential differences.
5 months ago
1+ week