Module for fast model serialization/deserialization
Top 100.0% on sourcepulse
This library provides a fast and efficient method for serializing and deserializing large machine learning models and their tensors. It targets ML engineers and researchers deploying models, enabling significantly reduced model load times from various storage backends like HTTP/S, Redis, and S3.
How It Works
Tensorizer serializes model weights into a single, optimized file. This approach decouples model artifacts from container images, reducing image size and deployment latency. It leverages network-bound deserialization speeds, achieving wire-speed loading on high-speed networks. The library supports streaming loads directly from S3 or HTTP/S endpoints without requiring local disk storage.
Quick Start & Requirements
python -m pip install tensorizer
transformers
and accelerate
libraries for model serialization/deserialization examples.~/.s3cfg
) or direct credential passing.Highlighted Details
state_dict
compatibility for torch.nn.Module.load_state_dict
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
qint8
) are not currently supported due to missing quantization parameters.plaid_mode
is deprecated and has no effect.1 day ago
1 day