Chemformer  by MolecularAI

Transformer model for chemistry tasks

Created 4 years ago
269 stars

Top 95.5% on SourcePulse

GitHubView on GitHub
Project Summary

Chemformer provides pre-trained BART transformer models for molecular tasks, specifically designed for chemists and researchers in drug discovery and chemical synthesis. It aims to improve generalization, performance, training speed, and validity on downstream tasks by pre-training on molecular SMILES strings using a denoising objective.

How It Works

Chemformer leverages a BART transformer architecture pre-trained on a large corpus of molecular SMILES strings with a denoising objective. This pre-training allows the model to learn rich representations of molecular structures and chemical transformations. The project offers implementations for various downstream tasks, including reaction prediction, retrosynthetic prediction, molecular optimization, and molecular property prediction, utilizing seq2seq and disconnection-aware approaches.

Quick Start & Requirements

  • Installation: Clone the repository, create a Conda environment (conda env create -f env-dev.yml), activate it (conda activate chemformer), and install dependencies (poetry install).
  • Prerequisites: Linux, Windows, or macOS with Python 3.7 and Anaconda/Miniconda. A workaround for GLIBCXX_3.4.21 errors is provided by adjusting LD_LIBRARY_PATH.
  • Usage: Fine-tune models by downloading pre-trained checkpoints and datasets, then updating and running provided shell scripts (e.g., fine_tune.sh). Configuration is managed via Hydra.
  • Resources: Links to public models and datasets are available.

Highlighted Details

  • Pre-trained BART transformer for molecular SMILES.
  • Supports reaction prediction, retrosynthesis, molecular optimization, and property prediction.
  • Implements seq2seq and disconnection-aware retrosynthesis.
  • Includes FastAPI services for predictions and integration with tools like AiZynthFinder.
  • Offers custom callbacks and scorers for flexible evaluation.

Maintenance & Community

The project welcomes contributions via issues and pull requests. Support is provided through the issue tracker, with limited time for direct support questions.

Licensing & Compatibility

The software is licensed under the MIT license, allowing for free use and commercial compatibility.

Limitations & Caveats

Users may need to update checkpoints for new versions. The project relies on specific versions of dependencies, and environment setup might require attention to LD_LIBRARY_PATH for certain systems.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 8 months ago
Feedback? Help us improve.