Serving Hugging Face models on Amazon SageMaker
Top 97.5% on sourcepulse
This library provides an open-source toolkit for deploying Hugging Face Transformers and Diffusers models on Amazon SageMaker, simplifying the inference process for developers and researchers. It offers default pre-processing, prediction, and post-processing for common Hugging Face models and tasks, leveraging the SageMaker Inference Toolkit for efficient model serving.
How It Works
The toolkit integrates with the SageMaker Inference Toolkit to manage model server startup and inference requests. It utilizes environment variables like HF_TASK
and HF_MODEL_ID
to automatically configure and load models from the Hugging Face Hub. Users can also provide custom inference logic by overriding default handler methods or including a code/inference.py
script within their model artifacts.
Quick Start & Requirements
pip install sagemaker --upgrade
from sagemaker.huggingface import HuggingFaceModel
huggingface_model = HuggingFaceModel(
transformers_version='4.6', pytorch_version='1.7', py_version='py36',
model_data='s3://my-trained-model/artifacts/model.tar.gz', role=role,
)
huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
hub = {'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', 'HF_TASK':'question-answering'}
huggingface_model = HuggingFaceModel(
transformers_version='4.6', pytorch_version='1.7', py_version='py36',
env=hub, role=role,
)
huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
Highlighted Details
HF_OPTIMUM_BATCH_SIZE
and HF_OPTIMUM_SEQUENCE_LENGTH
.code/inference.py
) that can override model_fn
, transform_fn
, input_fn
, predict_fn
, and output_fn
.HF_MODEL_REVISION
for pinning model versions and HF_API_TOKEN
for private models.Maintenance & Community
This project is part of the AWS Deep Learning Containers ecosystem. Contribution guidelines are available in CONTRIBUTING.md
.
Licensing & Compatibility
Licensed under the Apache 2.0 License. Compatible with commercial use.
Limitations & Caveats
The Hugging Face Hub deployment is noted as experimental and may not support all SageMaker features, such as Multi-Model Endpoints (MME).
3 months ago
1 day