TabFormer  by IBM

PyTorch code for tabular transformers research paper

created 4 years ago
344 stars

Top 81.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides PyTorch code and data for TabFormer, a novel approach to modeling multivariate time series using hierarchical transformers. It addresses the challenge of representing tabular data for time series analysis, targeting researchers and practitioners in time series forecasting and sequence modeling. The benefit is a more effective way to capture complex temporal dependencies within tabular datasets.

How It Works

TabFormer adapts the transformer architecture for tabular data by introducing specialized modules for hierarchical representation. It utilizes modified components from HuggingFace's Transformers library, including a Modified Adaptive Softmax for handling masking and a Modified DataCollatorForLanguageModeling tailored for tabular structures. This approach allows transformers, typically used for sequential text data, to effectively process and learn from structured, multi-field tabular time series.

Quick Start & Requirements

  • Install dependencies via conda env create -f setup.yml.
  • Requires Python 3.7, PyTorch 1.6.0, HuggingFace Transformers 3.2.0, scikit-learn 0.23.2, and Pandas 1.1.2.
  • The synthetic credit card dataset (24M records, 12 fields) is available via git-lfs or a direct link. The PRSA dataset needs to be downloaded separately from Kaggle.
  • Training commands are provided for Tabular BERT and Tabular GPT2 models.

Highlighted Details

  • Implements hierarchical transformers for tabular data.
  • Includes a synthetic credit card transaction dataset (24M records, 12 fields).
  • Features Modified Adaptive Softmax and DataCollatorForLanguageModeling for tabular data.
  • Built upon HuggingFace Transformers library.

Maintenance & Community

No specific information on maintainers, community channels, or roadmap is provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The code is tested on specific older versions of dependencies (Python 3.7, PyTorch 1.6.0, Transformers 3.2.0), which may require careful environment management or updates for compatibility with current systems. Accessing the provided dataset requires git-lfs, which can have bandwidth limitations.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.