tensordict by pytorch

PyTorch tensor container for efficient ML data handling

Created 3 years ago

1,030 stars

Top 36.1% on SourcePulse

View on GitHub

5 Experts Love This Project

Sasha Rush

Research Scientist at Cursor; Professor at Cornell Tech

Jeremy Howard

Cofounder of fast.ai

Ross Wightman

Author of timm; CV at Hugging Face

Soumith Chintala

Coauthor of PyTorch

and 1 more!

Project Summary

Summary

TensorDict addresses the complexity of managing multiple PyTorch tensors in machine learning workflows. It provides a dictionary-like container that inherits tensor properties, enabling developers to write more compact, readable, and efficient code for data manipulation, especially in research and large-scale ML applications.

How It Works

TensorDict introduces a specialized dictionary (TensorDict) and a tensor-aware dataclass (tensorclass). It generalizes standard PyTorch tensor operations—such as indexing, slicing, concatenation, and device casting—to collections of tensors, allowing them to be manipulated as a single unit. This approach simplifies code by abstracting repetitive operations and enables asynchronous device transfers for performance gains, while also supporting nested structures and compatibility with torch.compile and torch.vmap.

Quick Start & Requirements

Installation is straightforward via pip: pip install tensordict. For the latest features, use pip install tensordict-nightly. Conda users can install with conda install -c conda-forge tensordict. PyTorch is a core dependency. While basic usage doesn't require specific hardware, examples demonstrate CUDA acceleration.

Highlighted Details

Composability: Extends PyTorch tensor operations to collections, simplifying complex data structures.
Performance: Features asynchronous device transfers, fast inter-device communication, and compatibility with torch.compile for optimized execution.
Versatility: Supports nesting for hierarchical data, lazy preallocation, serialization, memory-mapping, and functional programming paradigms like torch.vmap.
Broad Ecosystem Integration: Widely adopted in Reinforcement Learning (TorchRL), LLM post-training, Robotics, Scientific ML, and Genomics.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README snippet.

Licensing & Compatibility

TensorDict is released under the permissive MIT License, allowing for broad compatibility with commercial and closed-source projects.

Limitations & Caveats

No explicit limitations, alpha status, or known issues were detailed in the provided README content. The availability of a -nightly build suggests active development.

Health Check

Last Commit

16 hours ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days