Recsys Spark Save

Spark SQL 实现 ItemCF,UserCF,Swing,推荐系统,推荐算法,协同过滤

Project README

recsys_spark

Spark SQL实现ItemCF,UserCF,Swing,推荐算法CF协同过滤召回模块

数据格式

商品交易数据,维度包括用户ID,商品ID,交易时间(userid,itemid,date),过滤掉黑名单用户和商品不参与计算

date userid itemid
2019-05-09 1901140040225006 103943
2019-05-09 1806041288325006 56610
2019-05-09 1812060050236636 16368
2019-05-09 1812060050261006 101562
2019-05-09 1901160070407006 79874

ItemCF(基于物品的协同过滤)

i2i2u算法,以用户曾经购买过的商品作为中间桥梁,连接用户和其他商品。 以商品共现作为相似度,对热门用户的长序列进行惩罚,相似度计算公式:

swing公式

UserCF(基于用户的协同过滤)

u2u2i算法,以用户作为中间桥梁,连接其他用户和商品 以用户共现作为相似度,对热门商品的长用户序列进行惩罚,相似度计算公式只需要把ItemCF公式中分子分母里面的i,j(商品1,商品2)换成u,v(用户1,用户2),用户序列 N(u)替换成 N(i)商品序列即可。

Swing(基于图的协同过滤)

i2i2u算法,以用户已经购买的商品作为中间桥梁,连接用户和其他商品。 为了衡量物品 i 和 j 的相似性,考察都购买了物品 i和 j 的用户 u 和 v, 如果这两个用户共同购买的物品越少,则物品 i 和 j 的相似性越高。相似度计算公式

swing公式

计算相似商品

phoenix查询hbase结果,ItemCF结果

item => [[item1, score],[item2, score]...]

spu recommend
00017_201209 [[201210,0.07535],[221502,0.03041],[215272,0.01753],[212219,0.01753],[228212,0.01688]
00042_103060 [[61212,0.03611],[10525,0.02616],[101486,0.03138],[91764,0.01898],[95527,0.02186],[661
0006d_25593 [[6598,0.00319],[11129,0.00762],[178,0.00696],[8558,0.0041],[11398,0.0029],[25536,0.012
00077_35837 [[25518,0.01044],[36420,0.41703],[36357,0.15762],[83810,0.02686],[103838,0.02686],[1038
0007c_9700 [[227970,0.03401],[219462,0.02626],[219401,0.02626],[223635,0.02247],[223641,0.02247],[2
000cb_33363 [[8572,0.00877],[19665,0.00756],[12812,0.01092],[11853,0.0094],[8528,0.01173],[1705,0.0
000d0_50738 [[119582,0.03503],[100296,0.02922],[97248,0.02309],[72044,0.02153],[79245,0.02023],[119
000d5_68111 [[50729,0.00632],[67871,0.02315],[68081,0.01277],[9624,0.01253],[57234,0.00996],[67983,
000dd_45311 [[3721,0.02095],[21908,0.0156],[25633,0.01145],[5002,0.01438],[28633,0.02605],[17088,0.

计算推荐结果

user => [item1, item2, item3...]

userid recommend
00000_180731 [50648,14253,211049,14255,209517,112985,48507,13458,206846,35472,18769,97610,78105,21
00003_532933 [203038,78262,81480,120623,203040,81447,100994,203009,101491,81457,114550,55115,80139
00007_552871 [105023,10199,100894,100565,99769,96980,30781,115965,230960,95059,11129,104702,51831,6
0000b_194813 [231082,60365,101950,57700,209504,113725,101939,5906,94771,59979,237823,102324,229264
0000e_398677 [210020,210019,74081,91787,48428,90769,17449,91800,91822,17448,91823,91803,17437,1162
0000e_590120 [106907,72369,94907,74972,79603,97245,202614,97243,207393,229353,74063,78596,210969,11
00010_180604 [73633,24509,24507,7481,101877,107612,116350,100115,34379,229431,113725,229618,236254,
00011_536634 [209481,210381,112120,234451,113968,119215,64699,121035,106867,121057,103750,48503,12,
00013_180604 [212154,212156,212157,17141,62421,69801,232732,62407,211132,211029,37857,215047,8741,6
Open Source Agenda is not affiliated with "Recsys Spark" Project. README Source: xiaogp/recsys_spark
Stars
130
Open Issues
4
Last Commit
4 years ago
Repository

Open Source Agenda Badge

Open Source Agenda Rating