tensorizer by coreweave

Module for fast model serialization/deserialization

Created 3 years ago

282 stars

Top 92.6% on SourcePulse

View on GitHub

6 Experts Love This Project

Chris Van Pelt

Cofounder of Weights & Biases

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Luis Capelo

Cofounder of Lightning AI

Simon Mo

Core Maintainer of vLLM

and 2 more!

Project Summary

This library provides a fast and efficient method for serializing and deserializing large machine learning models and their tensors. It targets ML engineers and researchers deploying models, enabling significantly reduced model load times from various storage backends like HTTP/S, Redis, and S3.

How It Works

Tensorizer serializes model weights into a single, optimized file. This approach decouples model artifacts from container images, reducing image size and deployment latency. It leverages network-bound deserialization speeds, achieving wire-speed loading on high-speed networks. The library supports streaming loads directly from S3 or HTTP/S endpoints without requiring local disk storage.

Quick Start & Requirements

Install via pip: python -m pip install tensorizer
Requires transformers and accelerate libraries for model serialization/deserialization examples.
S3 usage requires AWS credentials configuration (e.g., ~/.s3cfg) or direct credential passing.
Some tests require a GPU.

Highlighted Details

Achieves ~5GB/s load speeds for a 20GB GPT-J model on a 40GbE network.
Supports optional, fast tensor weight encryption/decryption using libsodium.
Offers concurrent read capabilities for improved performance on network-bound operations.
Provides direct state_dict compatibility for torch.nn.Module.load_state_dict.

Maintenance & Community

Developed by CoreWeave.
No explicit community links (Discord/Slack) or roadmap mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

Preliminary support for Redis is not recommended for model deployment.
Quantized datatypes (e.g., qint8) are not currently supported due to missing quantization parameters.
plaid_mode is deprecated and has no effect.

Health Check

Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days