sagemaker-huggingface-inference-toolkit  by aws

Serving Hugging Face models on Amazon SageMaker

created 4 years ago
264 stars

Top 97.5% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides an open-source toolkit for deploying Hugging Face Transformers and Diffusers models on Amazon SageMaker, simplifying the inference process for developers and researchers. It offers default pre-processing, prediction, and post-processing for common Hugging Face models and tasks, leveraging the SageMaker Inference Toolkit for efficient model serving.

How It Works

The toolkit integrates with the SageMaker Inference Toolkit to manage model server startup and inference requests. It utilizes environment variables like HF_TASK and HF_MODEL_ID to automatically configure and load models from the Hugging Face Hub. Users can also provide custom inference logic by overriding default handler methods or including a code/inference.py script within their model artifacts.

Quick Start & Requirements

  • Install: pip install sagemaker --upgrade
  • Deploy from S3:
    from sagemaker.huggingface import HuggingFaceModel
    huggingface_model = HuggingFaceModel(
        transformers_version='4.6', pytorch_version='1.7', py_version='py36',
        model_data='s3://my-trained-model/artifacts/model.tar.gz', role=role,
    )
    huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
    
  • Deploy from Hugging Face Hub (experimental):
    hub = {'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', 'HF_TASK':'question-answering'}
    huggingface_model = HuggingFaceModel(
        transformers_version='4.6', pytorch_version='1.7', py_version='py36',
        env=hub, role=role,
    )
    huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
    
  • Documentation: SageMaker Notebook Examples

Highlighted Details

  • Supports deployment of Hugging Face models on AWS Inferentia2, with options for pre-compiled models or on-the-fly compilation using HF_OPTIMUM_BATCH_SIZE and HF_OPTIMUM_SEQUENCE_LENGTH.
  • Allows customization of inference logic through user-defined scripts (code/inference.py) that can override model_fn, transform_fn, input_fn, predict_fn, and output_fn.
  • Environment variables simplify configuration, including HF_MODEL_REVISION for pinning model versions and HF_API_TOKEN for private models.
  • Provides local testing capabilities by running the inference server directly via Python.

Maintenance & Community

This project is part of the AWS Deep Learning Containers ecosystem. Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use.

Limitations & Caveats

The Hugging Face Hub deployment is noted as experimental and may not support all SageMaker features, such as Multi-Model Endpoints (MME).

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.