Algorithm code for predicting user age/gender from ad click history
Top 36.0% on sourcepulse
This repository contains the code for the 2020 Tencent College Algorithm Contest, which achieved 1st place. It addresses the problem of predicting user demographics (age and gender) based on their historical ad click data. The solution is designed for participants in similar data science competitions.
How It Works
The approach combines Word2Vector embeddings with a pre-trained BERT model. User click history is processed to extract features, which are then fed into a model that leverages both word embeddings for sequential data and BERT for contextual understanding of ad interactions. This hybrid approach aims to capture complex user behavior patterns for accurate demographic prediction.
Quick Start & Requirements
pip install transformers==2.8.0 pandas gensim scikit-learn filelock gdown
data
directory.data
or BERT
directories respectively.bash run.sh
to run the entire pipeline.Highlighted Details
Maintenance & Community
The repository is maintained by guoday. No specific community channels or roadmap are indicated in the README.
Licensing & Compatibility
The README does not explicitly state a license. The code is provided for the Tencent competition, and commercial use or linking with closed-source projects may require clarification.
Limitations & Caveats
The setup requires specific hardware (Ubuntu 16.04, 256GB RAM, 4x P100 GPUs) and older library versions (transformers==2.8.0), which may pose challenges for modern environments. The pre-training steps for BERT are resource-intensive.
2 years ago
Inactive