14th place in Kaggle / Santander Product Recommendation
Kaggle Santander Product Recommendation Competition
Santander Bank offers a lending hand to their customers through personalized product recommendations. In their second competition, Santander is challenging Kagglers to predict which products their existing customers will use in the next month based on their past behavior and that of similar customers.
Competition data consists of customer data from 2015-01 ~ 2016-05 (total of 17 month timestamps) including customer's demographic information and their product purchase behavior. Competition challenges you to predict top 7 products out of 24, that each customer in the test data is most likely to purchase on 2016-06.
Evaluation metric is in MAP@7, which made the direct optimization difficult during training phase. Instead, the mlogloss was widely used among kagglers to indirectly optimize the solution.
With BreakfastPirates generous sharing, using 2015-06 data-only as a training data seemed to perform pretty well in the leaderboard (reaching almost ~0.03). Single model performance was enough to place you on top of the leaderboard, since MAP@7 made the effect of ensemble relatively weak.
As always, feature engineering seemed to be the most important factor in this competition, along with good cv scheme to reach the best hyper-parameter that squeezes the performance from the given data.
Submission | CV LogLoss | Public LB | Rank | Private LB | Rank |
---|---|---|---|---|---|
bare_minimum | 1.84515 | - | - | 0.0165546 | 1406 |
reduced version by kweonwooj | 0.9492806 | - | - | 0.0302238 | 208 |
best single model by kweonwooj | 0.9396864 | 0.029975 | 182 | 0.0302794 | 175 |
reproduced version of 8th place solution | 0.885272 | - | - | 0.0309659 | 14 |
reproduced version of 8th place solution is a direct fork from GitHub by Alexander Ponomarchuk and sh1ng. I added personal comments and a execution log. All credits go to the producers.
[Data]
Place data in root_input
directory. You can download data from here.
[Code]
Above results can be replicated by runinng
python code/main.py
for each of the directories.
Make sure you are on Python 3.5.2 with library versions same as specified in requirements.txt
[Submit]
Submit the resulting csv file here and verify the score.
for bare minimum
for reduced version of kweonwooj
for reproduced version of 8th place