bionemo-framework  by NVIDIA

Accelerating AI for drug discovery and biomolecular modeling

Created 2 years ago
560 stars

Top 57.3% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Summary NVIDIA's BioNeMo Framework is a comprehensive suite for building and adapting AI models in drug discovery at scale. It targets digital biology scientists and researchers, accelerating biomolecular AI model development through domain-specific, optimized tools and models for high-performance GPU computation.

How It Works The framework provides tools, libraries, and models for computational drug discovery, leveraging NVIDIA's GPU acceleration expertise. It supports both PyTorch's Fully-sharded-data-parallel (FSDP) and explicit 5D parallelism (tensor, pipeline, context) via NeMo and Megatron-Core. Key optimizations include NVIDIA TransformerEngine (TE) for accelerated execution and FP8 precision, enabling state-of-the-art performance and scalability.

Quick Start & Requirements Installation is recommended via a pre-built Docker container (nvcr.io/nvidia/clara/bionemo-framework:nightly) from NGC. Local development requires cloning with submodules (git clone --recursive) and building the Docker image. GPU acceleration (CUDA) is essential. A VSCode devcontainer simplifies local testing.

Highlighted Details

  • Features TE-accelerated models like protein BERT (Amplify), ESM2, and single-cell BERT (Geneformer).
  • Supports FSDP and explicit 5D parallelism for large-scale training.
  • Integrates TransformerEngine (TE) and FP8 precision for significant performance gains.
  • Includes lightweight, pip-installable data loading tools (e.g., bionemo-noodles, bionemo-scdl).

Maintenance & Community BioNeMo Framework is part of NVIDIA's Biopharma ecosystem. Users can subscribe for release notifications. Specific community channels or a public roadmap are not detailed in the provided README.

Licensing & Compatibility The specific open-source license is not stated in the provided README content, requiring further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats Several recipes are End-of-Life (EOL) or Work-In-Progress (WIP), indicating ongoing development. Features like Sequence Packing and Context Parallelism are WIP for certain models. Implementing 5D parallelism requires direct model code modification. Initial devcontainer setup can be time-consuming.

Health Check
Last Commit

10 hours ago

Responsiveness

Inactive

Pull Requests (30d)
90
Issues (30d)
5
Star History
30 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

1.6%
1k
Framework for scaling multimodal model training across accelerators
Created 7 months ago
Updated 9 hours ago
Feedback? Help us improve.