pytorch-frame by pyg-team

PyTorch extension for heterogeneous tabular deep learning

Created 2 years ago

762 stars

Top 45.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Jeff Hammerbacher

Cofounder of Cloudera

Project Summary

PyTorch Frame is a modular deep learning library for PyTorch, designed to simplify the creation and training of neural network models on heterogeneous tabular data. It caters to researchers and practitioners looking to leverage deep learning for tabular datasets, offering a flexible framework that integrates various column types and state-of-the-art architectures.

How It Works

The library employs a modular architecture consisting of FeatureEncoder, TableConv, and Decoder components. FeatureEncoder transforms raw tabular data into embeddings, TableConv models interactions between features, and Decoder produces the final output. This design allows for easy experimentation with different model architectures and facilitates integration with other PyTorch libraries, such as PyG for graph neural networks.

Quick Start & Requirements

Install: pip install pytorch-frame
Requirements: Python 3.9-3.13. GPU recommended for training.
Docs: https://pytorch-frame.github.io/

Highlighted Details

Supports diverse column types: numerical, categorical, text, image, and embeddings.
Implements state-of-the-art deep tabular models (e.g., FTTransformer, TabNet) and integrates GBDTs (XGBoost, CatBoost, LightGBM).
Provides benchmark datasets and performance comparisons against GBDTs.
Facilitates integration with external embedding models (OpenAI, Cohere, Hugging Face) for text data.

Maintenance & Community

Community: Slack
Docs: https://pytorch-frame.github.io/
Paper: arXiv:2404.00776

Licensing & Compatibility

License: MIT License.
Compatibility: Compatible with commercial and closed-source applications.

Limitations & Caveats

While deep tabular models show competitive performance, the benchmarks indicate they can be significantly slower to train than GBDTs. Some models may also have higher memory requirements, with "OOM" (Out Of Memory) noted for Trompt and FTTransformerBucket on certain datasets.

Health Check

Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days