cloudpickle  by cloudpipe

Extended pickling support for Python objects

created 10 years ago
1,796 stars

Top 24.5% on sourcepulse

GitHubView on GitHub
Project Summary

cloudpickle extends Python's built-in pickle module to serialize a wider range of Python objects, particularly useful for distributed computing and interactive environments like Jupyter notebooks. It enables the serialization of lambda functions and objects defined in __main__, addressing limitations of standard pickling for dynamic code execution.

How It Works

cloudpickle primarily achieves its extended serialization capabilities by implementing "serialization by value" for functions and classes. Unlike standard pickle's "serialization by reference" (which relies on module imports in the unpickling environment), cloudpickle can embed the actual code of functions and classes. This is particularly advantageous for cluster computing where remote workers might not have access to the same modules or environments as the client. Explicit registration (register_pickle_by_value) allows users to opt-in to this behavior for specific modules, simplifying deployment in distributed systems.

Quick Start & Requirements

  • Install: pip install cloudpickle
  • Requirements: Python 3.x. Compatibility is maintained across Python versions, but objects serialized with one Python version cannot be loaded by a different Python version.
  • Links: PyPI

Highlighted Details

  • Supports pickling lambda functions and interactively defined functions/classes.
  • Offers "serialization by value" as an alternative to pickle's "serialization by reference."
  • Explicit API (register_pickle_by_value) to control serialization behavior for modules.
  • Originally developed by PiCloud and integrated into Apache Spark.

Maintenance & Community

  • Active development with contributions from Apache Spark developers.
  • Tests available via tox for multiple Python versions.
  • Links: GitHub

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Suitable for commercial use and integration with closed-source projects. Objects are only compatible between the exact same Python versions.

Limitations & Caveats

Serialization by value is experimental and may fail if pickled functions contain import statements or if functions pickled by reference call functions pickled by value. cloudpickle is not intended for long-term object storage. Loading data from untrusted sources is a security risk due to potential arbitrary code execution.

Health Check
Last commit

3 weeks ago

Responsiveness

1+ week

Pull Requests (30d)
1
Issues (30d)
3
Star History
49 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems).

bytewax by bytewax

0.3%
2k
Python framework for stateful stream processing
created 3 years ago
updated 4 months ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.