RNA foundation model for RNA sequence analysis and design
Top 91.9% on sourcepulse
RNA-FM is a foundation model for RNA sequences, offering general-purpose embeddings for diverse downstream tasks like structure prediction and functional analysis. It forms the core of an integrated ecosystem including RhoFold (sequence-to-structure), RiboDiffusion, and RhoDesign (structure-to-sequence design), targeting researchers in RNA therapeutics, synthetic biology, and fundamental RNA biology.
How It Works
RNA-FM is a BERT-style transformer encoder pre-trained on over 23 million non-coding RNA sequences using a masked language model objective. This self-supervised approach extracts rich structural and functional information without labeled data, generating 640-dimensional embeddings. The extended ecosystem leverages these embeddings: RhoFold uses them with a geometry module for accurate tertiary structure prediction, while RiboDiffusion (a diffusion model) and RhoDesign (a GVP+Transformer model) employ them for advanced RNA inverse folding and design.
Quick Start & Requirements
conda env create -f environment.yml
), activate it (conda activate RNA-FM
), and download pre-trained models.python launch/predict.py
for embedding generation or secondary structure prediction.Highlighted Details
Maintenance & Community
The project is actively developed by ml4bio, with associated repositories for RhoFold, RiboDiffusion, and RhoDesign. Community support is available via GitHub Issues.
Licensing & Compatibility
The source code is released under the MIT license, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
The README mentions a separate server for RhoFold, implying that local tertiary structure prediction might require additional setup or specific dependencies not detailed in the main RNA-FM setup. mRNA-FM requires input sequences to be codon-aligned (length divisible by 3).
2 months ago
Inactive