sagemaker-huggingface-inference-toolkit  by aws

Serving Hugging Face models on Amazon SageMaker

Created 4 years ago
266 stars

Top 96.2% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides an open-source toolkit for deploying Hugging Face Transformers and Diffusers models on Amazon SageMaker, simplifying the inference process for developers and researchers. It offers default pre-processing, prediction, and post-processing for common Hugging Face models and tasks, leveraging the SageMaker Inference Toolkit for efficient model serving.

How It Works

The toolkit integrates with the SageMaker Inference Toolkit to manage model server startup and inference requests. It utilizes environment variables like HF_TASK and HF_MODEL_ID to automatically configure and load models from the Hugging Face Hub. Users can also provide custom inference logic by overriding default handler methods or including a code/inference.py script within their model artifacts.

Quick Start & Requirements

  • Install: pip install sagemaker --upgrade
  • Deploy from S3:
    from sagemaker.huggingface import HuggingFaceModel
    huggingface_model = HuggingFaceModel(
        transformers_version='4.6', pytorch_version='1.7', py_version='py36',
        model_data='s3://my-trained-model/artifacts/model.tar.gz', role=role,
    )
    huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
    
  • Deploy from Hugging Face Hub (experimental):
    hub = {'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', 'HF_TASK':'question-answering'}
    huggingface_model = HuggingFaceModel(
        transformers_version='4.6', pytorch_version='1.7', py_version='py36',
        env=hub, role=role,
    )
    huggingface_model.deploy(initial_instance_count=1, instance_type="ml.m5.xlarge")
    
  • Documentation: SageMaker Notebook Examples

Highlighted Details

  • Supports deployment of Hugging Face models on AWS Inferentia2, with options for pre-compiled models or on-the-fly compilation using HF_OPTIMUM_BATCH_SIZE and HF_OPTIMUM_SEQUENCE_LENGTH.
  • Allows customization of inference logic through user-defined scripts (code/inference.py) that can override model_fn, transform_fn, input_fn, predict_fn, and output_fn.
  • Environment variables simplify configuration, including HF_MODEL_REVISION for pinning model versions and HF_API_TOKEN for private models.
  • Provides local testing capabilities by running the inference server directly via Python.

Maintenance & Community

This project is part of the AWS Deep Learning Containers ecosystem. Contribution guidelines are available in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under the Apache 2.0 License. Compatible with commercial use.

Limitations & Caveats

The Hugging Face Hub deployment is noted as experimental and may not support all SageMaker features, such as Multi-Model Endpoints (MME).

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
7 more.

truss by basetenlabs

0.2%
1k
Model deployment tool for productionizing AI/ML models
Created 3 years ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Travis Fischer Travis Fischer(Founder of Agentic), and
2 more.

modelscope by modelscope

0.2%
8k
Model-as-a-Service library for model inference, training, and evaluation
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.