Proteina-Complexa  by NVIDIA-Digital-Bio

Generative AI for atomistic protein and ligand binder design

Created 1 month ago
306 stars

Top 87.6% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Proteina-Complexa is a generative model for atomistic protein binder design, unifying conditional generative modeling and sequence optimization. It targets researchers and developers in drug discovery and protein engineering, offering state-of-the-art performance in designing novel protein and small molecule binders.

How It Works

This project extends flow-based latent protein generation architectures using flow matching, unifying generative and "hallucination" methods via inference-time optimization. It jointly models backbone geometry, side-chain conformations, and sequences. Pretraining utilizes the Teddymer dataset, a large-scale collection of synthetic binder-target pairs derived from predicted protein structures and experimental multimers.

Quick Start & Requirements

  • Installation: Recommended: UV environment (./env/build_uv_env.sh, source .venv/bin/activate). Alternative: Docker (docker build, docker run --gpus all).
  • Prerequisites: Ubuntu 22.04+ for UV (or Docker). GPU access required for Docker. Python 3.12 has a tmol installation workaround. Post-installation requires complexa init and complexa download --all. Environment variables for reward models (AF2, RF3) and tools (Foldseek, MMseqs2, DSSP, SC) must be configured in .env.
  • Links: Paper, Project Page.

Highlighted Details

  • Achieves state-of-the-art binder design performance with higher in-silico success rates.
  • Test-time optimization strategies outperform prior hallucination methods under normalized compute.
  • Supports protein binder, ligand binder, and motif scaffolding (AME) design for small molecules.
  • Designs experimentally validated, confirming in-silico success translates to binding activity.

Maintenance & Community

Associated with NVIDIA and academic institutions, with core contributors listed. Links to the paper and project page are provided. No explicit community channels are mentioned.

Licensing & Compatibility

License details are in a LICENSE file. The specific license type and compatibility for commercial use are not detailed in the README.

Limitations & Caveats

  • Known tmol installation issue on Python 3.12 requires a workaround.
  • TMOL reward model is unsupported for protein-ligand complexes in ligand binder/AME pipelines.
  • AME input PDBs require strict chain (A for ligand, B for motif) and residue naming (L:0 for ligand) conventions to prevent errors.
Health Check
Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
29
Star History
307 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
2 more.

evo by evo-design

0.1%
1k
DNA foundation model for long-context biological sequence modeling and design
Created 2 years ago
Updated 3 weeks ago
Feedback? Help us improve.