RFdiffusion is an open-source protein structure generation tool that enables the design of novel protein sequences and folds, with or without specific constraints like motifs or target interfaces. It is primarily aimed at computational biologists and structural biologists seeking to design proteins with tailored functions, such as binders or symmetric protein complexes.
How It Works
RFdiffusion leverages a diffusion model architecture, building upon the foundation of RoseTTAFold. It operates by iteratively denoising a random noise distribution into a protein structure. The model can be conditioned on various inputs, including sequence motifs, target protein interfaces (via "hotspot residues"), secondary structure, and symmetry constraints, allowing for precise control over the generated protein's architecture and function.
Quick Start & Requirements
- Installation: Clone the repository, download model weights, install dependencies via Conda (
env/SE3nv.yml
), and install RFdiffusion (pip install -e .
).
- Prerequisites: CUDA 11.1 (customizable for other versions), Anaconda/Miniconda.
- Setup Time: Less than 30 minutes.
- Documentation: RFdiffusion README
Highlighted Details
- Supports motif scaffolding, unconditional generation, symmetric designs (cyclic, dihedral, tetrahedral), binder design, and sequence inpainting.
- Offers fine-tuned models for specific tasks like active site scaffolding and beta-strand binder design.
- Integrates with external tools like ProteinMPNN for sequence design and AlphaFold2 for filtering.
- Enables fold conditioning using secondary structure and adjacency information.
Maintenance & Community
- Developed by the RosettaCommons, with contributions from researchers at the University of Washington Institute for Protein Design.
- Active development is indicated, with plans for future updates.
- Users are encouraged to create GitHub issues for support.
Licensing & Compatibility
- Released under a BSD License, permitting both non-profit and for-profit use.
- Compatible with commercial applications.
Limitations & Caveats
- The runtime scales quadratically with the number of residues, making large targets computationally intensive.
- Designing binders to charged polar sites or sites near glycans can be challenging.
- Output sequences for designed regions are initially poly-Glycine, requiring a separate sequence design step.