RFdiffusion  by RosettaCommons

Protein structure generation via diffusion

created 2 years ago
2,330 stars

Top 20.0% on sourcepulse

GitHubView on GitHub
Project Summary

RFdiffusion is an open-source protein structure generation tool that enables the design of novel protein sequences and folds, with or without specific constraints like motifs or target interfaces. It is primarily aimed at computational biologists and structural biologists seeking to design proteins with tailored functions, such as binders or symmetric protein complexes.

How It Works

RFdiffusion leverages a diffusion model architecture, building upon the foundation of RoseTTAFold. It operates by iteratively denoising a random noise distribution into a protein structure. The model can be conditioned on various inputs, including sequence motifs, target protein interfaces (via "hotspot residues"), secondary structure, and symmetry constraints, allowing for precise control over the generated protein's architecture and function.

Quick Start & Requirements

  • Installation: Clone the repository, download model weights, install dependencies via Conda (env/SE3nv.yml), and install RFdiffusion (pip install -e .).
  • Prerequisites: CUDA 11.1 (customizable for other versions), Anaconda/Miniconda.
  • Setup Time: Less than 30 minutes.
  • Documentation: RFdiffusion README

Highlighted Details

  • Supports motif scaffolding, unconditional generation, symmetric designs (cyclic, dihedral, tetrahedral), binder design, and sequence inpainting.
  • Offers fine-tuned models for specific tasks like active site scaffolding and beta-strand binder design.
  • Integrates with external tools like ProteinMPNN for sequence design and AlphaFold2 for filtering.
  • Enables fold conditioning using secondary structure and adjacency information.

Maintenance & Community

  • Developed by the RosettaCommons, with contributions from researchers at the University of Washington Institute for Protein Design.
  • Active development is indicated, with plans for future updates.
  • Users are encouraged to create GitHub issues for support.

Licensing & Compatibility

  • Released under a BSD License, permitting both non-profit and for-profit use.
  • Compatible with commercial applications.

Limitations & Caveats

  • The runtime scales quadratically with the number of residues, making large targets computationally intensive.
  • Designing binders to charged polar sites or sites near glycans can be challenging.
  • Output sequences for designed regions are initially poly-Glycine, requiring a separate sequence design step.
Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
17
Star History
182 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

hyena-dna by HazyResearch

0.1%
704
Genomic foundation model for long-range DNA sequence modeling
created 2 years ago
updated 3 months ago
Feedback? Help us improve.