minAlphaFold2  by ChrisHayduk

Minimal PyTorch AlphaFold2 reimplementation for learning and research

Created 2 months ago
584 stars

Top 55.4% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

A minimal, pedagogical PyTorch reimplementation of AlphaFold2, designed for understanding and modification. It targets AI researchers and engineers seeking to demystify AlphaFold2's architecture and training, thereby accelerating progress in AI x biology by making the system accessible.

How It Works

The project offers a direct, 1-to-1 reimplementation of AlphaFold2's model and training pipeline using only core PyTorch primitives. Each module maps precisely to a numbered algorithm in the official supplement, prioritizing readability over production optimizations. This approach enables end-to-end training on a single GPU through gradient accumulation and checkpointing, making the complex system comprehensible within an afternoon.

Quick Start & Requirements

  • Installation: pip install -e '.[dev]'
  • Prerequisites: PyTorch, NumPy, pytest. Optional: OpenMM, pdbfixer for structure relaxation. Modal integration requires pip install -e '.[modal]'.
  • Hardware: Trainable on a single GPU; Modal runners support H200/A100 GPUs.
  • Setup: A 5-minute sanity check involves overfitting a single PDB on CPU. Full training requires significant time (~7 days on TPUv3 for Stage 1) and ~100 GB for data uploads to Modal.
  • Documentation: Primary reference is the AlphaFold2 supplement.

Highlighted Details

  • Pure PyTorch: Utilizes only nn.Linear, nn.LayerNorm, torch.einsum, and standard activations.
  • Supplement Mapping: Every file and module directly corresponds to a numbered algorithm in the AlphaFold2 supplement.
  • Paper-Spec Training: Implements the two-stage training recipe, including LR schedules, parameter EMA, and violation loss.
  • Single GPU Trainability: Achieved via gradient accumulation and gradient checkpointing.
  • Unit Consistency: Structure module operates internally in nanometers, with explicit conversions at boundaries.
  • Zero-Initialization: Implements zero-init for output projections and gate biases per §1.11.4.

Maintenance & Community

No specific details on active maintenance, community channels (like Discord/Slack), or a public roadmap are provided in the README.

Licensing & Compatibility

The project is licensed under MIT. Data derived from the AlphaFold2 source code is under Apache 2.0. The MIT license is permissive for commercial use and closed-source linking.

Limitations & Caveats

This is a pedagogical reimplementation, not an inference harness or speed benchmark. It supports monomers only and excludes multimer/AF3 architectures. Features like self-distillation dataset generation, custom MSA generation, and specific MMseqs2 clustering are out of scope. The structure relaxation procedure has caveats for highly violating inputs, recommending training with the violation loss active for cleaner outputs. The trainer lacks distributed data parallelism and relies on per-micro-batch gradient clipping for larger micro-batches.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
137 stars in the last 30 days

Explore Similar Projects

Starred by Victor Taelin Victor Taelin(Author of Bend, Kind, HVM), Sebastian Raschka Sebastian Raschka(Author of "Build a Large Language Model (From Scratch)"), and
2 more.

nanoT5 by PiotrNawrot

0%
1k
PyTorch code for T5 pre-training and fine-tuning on a single GPU
Created 3 years ago
Updated 1 year ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
15 more.

torchtune by meta-pytorch

0.1%
6k
PyTorch library for LLM post-training and experimentation
Created 2 years ago
Updated 1 day ago
Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Alexey Milovidov Alexey Milovidov(Cofounder of Clickhouse), and
29 more.

llm.c by karpathy

0.2%
30k
LLM training in pure C/CUDA, no PyTorch needed
Created 2 years ago
Updated 10 months ago
Feedback? Help us improve.