TorchEasyRec  by alibaba

PyTorch recommendation framework for production deep learning

Created 1 year ago
361 stars

Top 77.8% on SourcePulse

GitHubView on GitHub
Project Summary

TorchEasyRec is a PyTorch-based framework designed for the efficient development and production deployment of large-scale recommendation system algorithms. It addresses the complexity of building state-of-the-art models for candidate generation, ranking, multi-task learning, and generative recommendation, offering a simplified configuration and customization approach for engineers and researchers. The framework aims to accelerate the creation of high-performance recommendation models ready for production environments.

How It Works

TorchEasyRec leverages PyTorch to implement deep learning models for various recommendation tasks. Its core design emphasizes ease of use through simple configuration files and straightforward customization of models and features. Key architectural choices include support for distributed training via hybrid data/model parallelism (using TorchRec), advanced large embedding management with sharding and eviction policies (LFU/LRU), and zero-collision hashing for dynamic embeddings. This approach facilitates scalability and efficient handling of massive datasets and models.

Quick Start & Requirements

Getting started involves following tutorials for local or cloud-based training environments. Specific installation commands are not detailed, but the framework is PyTorch-based. Prerequisites include environments supporting distributed training and potentially large memory footprints for embeddings. Alibaba Cloud services like MaxCompute, PAI-DLC, and EAS are integrated, suggesting an optimized experience within that ecosystem.

Highlighted Details

  • Extensive Model Zoo: Includes over 20 battle-tested models such as DSSM, TDM, DeepFM, DIN, MMoE, PLE, PEPNet, and DLRM-HSTU, covering candidate generation, ranking, multi-task learning, and generative recommendation.
  • Flexible Data Handling: Supports diverse data sources including MaxCompute/ODPS, Parquet (with auto-rebalancing), CSV, and Kafka streaming.
  • Production-Ready Features: Offers consistent feature generation between training and serving, EAS deployment for auto-scaling model serving, and TensorRT/AOTInductor acceleration for inference.
  • Scalability: Implements distributed training, row/column/table-wise sharding for large embeddings, and zero-collision hashing with eviction policies.

Maintenance & Community

The project is developed by the Alibaba PAI Team. Community support and bug reporting are primarily handled through GitHub Issues. For direct interaction and enterprise service inquiries, users can join DingTalk groups: 32260796 and 37930014162. Contributions are welcomed via pull requests, with a development guide available for more details.

Licensing & Compatibility

TorchEasyRec is released under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects. However, users should be aware that third-party libraries integrated within the framework may carry different licenses.

Limitations & Caveats

The provided README does not explicitly detail any limitations, alpha status, or known bugs. The framework appears geared towards production use, with a strong emphasis on integration with Alibaba Cloud services, which might imply a more optimized experience within that ecosystem.

Health Check
Last Commit

23 hours ago

Responsiveness

Inactive

Pull Requests (30d)
48
Issues (30d)
5
Star History
17 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.