tensordict  by pytorch

PyTorch tensor container for efficient ML data handling

Created 3 years ago
1,012 stars

Top 36.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

TensorDict addresses the complexity of managing multiple PyTorch tensors in machine learning workflows. It provides a dictionary-like container that inherits tensor properties, enabling developers to write more compact, readable, and efficient code for data manipulation, especially in research and large-scale ML applications.

How It Works

TensorDict introduces a specialized dictionary (TensorDict) and a tensor-aware dataclass (tensorclass). It generalizes standard PyTorch tensor operations—such as indexing, slicing, concatenation, and device casting—to collections of tensors, allowing them to be manipulated as a single unit. This approach simplifies code by abstracting repetitive operations and enables asynchronous device transfers for performance gains, while also supporting nested structures and compatibility with torch.compile and torch.vmap.

Quick Start & Requirements

Installation is straightforward via pip: pip install tensordict. For the latest features, use pip install tensordict-nightly. Conda users can install with conda install -c conda-forge tensordict. PyTorch is a core dependency. While basic usage doesn't require specific hardware, examples demonstrate CUDA acceleration.

Highlighted Details

  • Composability: Extends PyTorch tensor operations to collections, simplifying complex data structures.
  • Performance: Features asynchronous device transfers, fast inter-device communication, and compatibility with torch.compile for optimized execution.
  • Versatility: Supports nesting for hierarchical data, lazy preallocation, serialization, memory-mapping, and functional programming paradigms like torch.vmap.
  • Broad Ecosystem Integration: Widely adopted in Reinforcement Learning (TorchRL), LLM post-training, Robotics, Scientific ML, and Genomics.

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord/Slack), or roadmap were provided in the README snippet.

Licensing & Compatibility

TensorDict is released under the permissive MIT License, allowing for broad compatibility with commercial and closed-source projects.

Limitations & Caveats

No explicit limitations, alpha status, or known issues were detailed in the provided README content. The availability of a -nightly build suggests active development.

Health Check
Last Commit

20 hours ago

Responsiveness

Inactive

Pull Requests (30d)
106
Issues (30d)
6
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

rtp-llm by alibaba

0.8%
1k
LLM inference engine for diverse applications
Created 2 years ago
Updated 18 hours ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

1.3%
2k
Framework for scaling multimodal model training across accelerators
Created 11 months ago
Updated 19 hours ago
Starred by David Cournapeau David Cournapeau(Author of scikit-learn), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
5 more.

lectures by gpu-mode

0.4%
6k
Lecture series for GPU-accelerated computing
Created 2 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Woosuk Kwon Woosuk Kwon(Coauthor of vLLM), and
15 more.

torchtitan by pytorch

0.5%
5k
PyTorch platform for generative AI model training research
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.