RFdiffusion by RosettaCommons

Protein structure generation via diffusion

Created 2 years ago

2,689 stars

Top 17.4% on SourcePulse

Project Summary

RFdiffusion is an open-source protein structure generation tool that enables the design of novel protein sequences and folds, with or without specific constraints like motifs or target interfaces. It is primarily aimed at computational biologists and structural biologists seeking to design proteins with tailored functions, such as binders or symmetric protein complexes.

How It Works

RFdiffusion leverages a diffusion model architecture, building upon the foundation of RoseTTAFold. It operates by iteratively denoising a random noise distribution into a protein structure. The model can be conditioned on various inputs, including sequence motifs, target protein interfaces (via "hotspot residues"), secondary structure, and symmetry constraints, allowing for precise control over the generated protein's architecture and function.

Quick Start & Requirements

Installation: Clone the repository, download model weights, install dependencies via Conda (env/SE3nv.yml), and install RFdiffusion (pip install -e .).
Prerequisites: CUDA 11.1 (customizable for other versions), Anaconda/Miniconda.
Setup Time: Less than 30 minutes.
Documentation: RFdiffusion README

Highlighted Details

Supports motif scaffolding, unconditional generation, symmetric designs (cyclic, dihedral, tetrahedral), binder design, and sequence inpainting.
Offers fine-tuned models for specific tasks like active site scaffolding and beta-strand binder design.
Integrates with external tools like ProteinMPNN for sequence design and AlphaFold2 for filtering.
Enables fold conditioning using secondary structure and adjacency information.

Maintenance & Community

Developed by the RosettaCommons, with contributions from researchers at the University of Washington Institute for Protein Design.
Active development is indicated, with plans for future updates.
Users are encouraged to create GitHub issues for support.

Licensing & Compatibility

Released under a BSD License, permitting both non-profit and for-profit use.
Compatible with commercial applications.

Limitations & Caveats

The runtime scales quadratically with the number of residues, making large targets computationally intensive.
Designing binders to charged polar sites or sites near glycans can be challenging.
Output sequences for designed regions are initially poly-Glycine, requiring a separate sequence design step.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

58 stars in the last 30 days