esm  by evolutionaryscale

Protein models & API for generative tasks and representation learning

created 1 year ago
2,025 stars

Top 22.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides access to EvolutionaryScale's flagship protein language models, ESM3 (generative) and ESM C (representation learning). It's designed for researchers and developers in bioinformatics and computational biology seeking advanced tools for protein sequence, structure, and function prediction and generation. The library offers a unified interface for local execution and cloud-based inference via the EvolutionaryScale Forge API and AWS SageMaker.

How It Works

ESM3 is a multimodal, generative masked language model that reasons across protein sequence, structure, and function. It uses a scalable transformer backbone, allowing iterative generation by sampling masked tokens. ESM C is a parallel representation learning model, designed as a drop-in replacement for ESM2, offering significant performance and efficiency gains. Both models leverage discrete token representations for their respective tasks.

Quick Start & Requirements

  • Install via pip: pip install esm
  • Requires PyTorch and CUDA-enabled GPU for local execution.
  • Hugging Face Hub login is required for model weight downloads.
  • Forge API access requires an API token.
  • SageMaker deployment involves AWS account setup and CloudFormation stack creation.
  • Local model instantiation downloads weights from HuggingFace Hub.
  • See ESM3 Quickstart and ESM C Quickstart for detailed examples.

Highlighted Details

  • ESM3 98B trained with 1.07e24 FLOPs.
  • ESM C 6B outperforms ESM2 15B.
  • Flash Attention support for ESM C via pip install flash-attn.
  • Forge Batch Executor for efficient concurrent processing.

Maintenance & Community

  • Developed by EvolutionaryScale, a public benefit company.
  • Follows a Responsible Development Framework.
  • Citations provided for ESM3 and ESM C models.

Licensing & Compatibility

  • Code and weights are under a mixture of non-commercial and permissive commercial licenses. Refer to LICENSE.md for details.
  • SageMaker deployment is under the Cambrian Inference Clickthrough License Agreement, allowing commercial use.

Limitations & Caveats

  • Local execution requires significant computational resources, especially for larger models.
  • Forge and SageMaker deployments involve external service dependencies and potential costs.
  • Specific model versions and their availability may change.
Health Check
Last commit

2 weeks ago

Responsiveness

1 week

Pull Requests (30d)
1
Issues (30d)
5
Star History
141 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Zhuohan Li Zhuohan Li(Author of vLLM), and
1 more.

Consistency_LLM by hao-ai-lab

0%
397
Parallel decoder for efficient LLM inference
created 1 year ago
updated 8 months ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake) and Travis Fischer Travis Fischer(Founder of Agentic).

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
created 9 months ago
updated 2 weeks ago
Feedback? Help us improve.