Title | Data mining - market basket analysis for retail point of sale data |
---|---|
Author | Prateek Mehta |
Course | Digital Marketing |
Institution | Institute of Business Management |
Pages | 16 |
File Size | 537.5 KB |
File Type | |
Total Downloads | 76 |
Total Views | 139 |
market basket analysis for retail point of sale data...
Data Mining Overview Theex t r ac t i onofhi ddenpr edi c t i v ei nf or mat i onf r om l ar gedat abas es ,i sapower f ulnewt ec hnol ogy wi t hgr eatpot ent i alt ohel pc ompani esf ocusont hemos ti mpor t anti nf or mat i oni nt hei rdat a war ehouses .Dat ami ni ngt ool spr edi c tf ut ur et r endsandbehav i or s ,al l owi ngbus i nes s est omak e pr oact i v e,knowl edgedr i v endec i s i ons .Theaut omat ed,pr os pec t i v eanal y sesoffer edbydat ami ni ng mov ebey ondt heanal y sesofpas tev ent spr ovi dedbyr et r os pec t i v et ool st ypi c al ofdec i s i onsuppor t s y s t ems .Dat ami ni ngt ool sc anans werbusi nessques t i onst hatt r adi t i onal l ywer et oot i me c onsumi ngt or es ol v e.Theys c ourdat abas esf orhi ddenpat t er ns ,findi ngpr edi ct i v ei nf or mat i ont hat ex per t sma ymi s sbecausei tl i esout s i det hei rex pect at i ons .Bel owi st heEnt i t yr el at i ons hi pdi agr am depi c t i ngt her el at i onshi poft hedat as har edf ort hepr oj ec t .
Challenges faced: 1. Thedat awasbr ok eni nt o3s et s .1.Poi ntofSal edat a,2.Cus t omerdat aand3.Pr i c i ngdat a. ThePoSdat as ethadt her ec ei ptno.andar t i c l espur c hasedi neac hr ecei ptbyt hec us t omer . Thec ompl ex i t ywast hedat as etwasv er yl ar get obemas s agedi nex celasi twasmor et han 3, 367, 020obj ec t sof3v ar i abl es .Thi shaduni quer ecei ptno.f oreac hSKUsbought . 2. Summar i z at i onoft hi sdat ac oul dn’ tbes uppor t edonEx c elorAcc ess .Wet r i eddoi ngt hi sbut wer el i mi t edbyt hes i z eofsuppor t eddat as eti nEx c el orAc c ess . 3. Wi t hi nt hedat as eti twasr equi r edt os ummar i z eal l uni queSKUsonperr ecei ptbas i s .Whi c h meanst hatt heSKUsneededt obeuni quei dent i fier s . 4. TheSKUsl i s t i ngi nc hr onol ogi cal or derwasofi mpor t anc eonwhatgotpur chas edfi r s tand whatf ol l owedorpr eceded.Thi swasofi mpor t anc easi tenabl esbet t ermer c handi z i ng pl anni ng. 5. Theas s oci at i onr ul emi ni ngwasdoneusi ngApr i or ial gor i t hm andt hemaj orc hal l engewast o dec i deont her i ghtbal anceofMi ni mum l engt h,Suppor tandConfi denc e.I twasdi ffic ul tt o ar r i v eatac ombi nat i onofr i ghtmi xofLHSandRHSwi t hr equi r eds uppor tandc onfidenc e. Weneedt oes t abl i sht her i ghtl engt ht ohav er i ghtno.ofar t i c l esi nar ecei pt( nott oomany andnott ool es sbasedonav er ageSKUsperr ec ei pt ) 6. Anot herc hal l engewast oes t abl i sht heRHSas soc i at i ont oi dent i f ynotonl yt hemos tr unni ng ar t i c l ebutal sot hel eas tmov i ngar t i cl e.Thel eas tmov i ngar t i cl ewast obei dent i fi edt o s ur f acet her i ghtmi xofar t i c l eswi t hwhi c hapr omot i oncoul dbepl anned.
Market Basket Analysis Mar k etBas k etAnal y s i suncov er sass oc i at i onsbet weenpr oduct sbyl ook i ngf orc ombi nat i onsof pr oduct st hatf r equent l yc oocc uri nt r ans ac t i ons .I tal l owst hes uper mar k et st oi dent i f yr el at i onshi ps bet weent hepr oduc t st hatc us t omerbuyf orv ar i ouspur pos es .
Retail Market Basket Data Set Thi spr oj ectanal y s est her et ai l mar k etbas k etdat asets uppl i edbyaanony mousBel gi anr et ai l s uper mar k ets t or e.Thedat aar ec ol l ec t edov ert hr eenonc ons ecut i v eper i ods .Thi sr es ul t si n appr oxi mat el y5mont hsofdat a.Thet ot alamountofr ec ei pt sbei ngc ol l ect edequal s88, 162.Ov er t heent i r edat ac ol l ect i onper i od,t hes uper mar k ets t or ec ar r i es16, 470uni queSKU’ s( St ockKeepi ng Uni t s ) .I nt ot al ,5, 133c us t omer shav epur c has edatl eas tonepr oducti nt hesuper mar k etdur i ngt he dat ac ol l ect i onper i od.
Project Approach
Gr oupi ngpr oduct st hatcoocc uri nt hedes i gnofas t or e’ sl ay outt oi ncr eas et hec hanceof c r os s sel l i ng.Fort hi spur pos e,wewoul dbeus i ngApr i or i ,Ec l atandFr equentPat t er n al gor i t hm t os t udyt hecus t omerbehavi our .
T ar get i ngmar k et i ngc ampai gnsbys endi ngoutpr omot i onaloffer st oc us t omer sr el at edt o pr oductt heypur chas ed.
Algorithms and Packages used + arules + arulesViz + eclat + Frequent Pattern growth
+ bigmemory
Data Statistics retail {38} 0.01973639 0.9891984
5.591800
## 72
{170,48}
=> {38} 0.01744516 0.9877970
5.583878
## 58
{110,48}
=> {38} 0.01543749 0.9862319
5.575030
## 74
{170,39}
=> {38} 0.02290102 0.9805731
5.543042
## 33
{170}
=> {38} 0.03437989 0.9780574
5.528821
## 19
{110}
=> {38} 0.03090901 0.9753042
5.513258
## 1
{37}
=> {38} 0.01186452 0.9739292
5.505485
## 113 {36,39,48}
=> {38} 0.01225018 0.9677419
5.470509
## 64
{36,48}
=> {38} 0.01542615 0.9604520
5.429300
## 66
{36,39}
=> {38} 0.02206166 0.9548355
5.397551
## 28
{36}
=> {38} 0.03164629 0.9502725
5.371757
## 2
{286}
=> {38} 0.01265852 0.9433643
5.332706
## 121 {38,39,48}
=> {41} 0.02258343 0.3262865
1.924795
## 125 {32,39,48}
=> {41} 0.01867018 0.3047020
1.797466
## 92
{38,48}
=> {41} 0.02692770 0.2988419
1.762897
## 95
{38,39}
=> {41} 0.03460675 0.2949251
1.739792
## 102 {32,39}
=> {41} 0.02675756 0.2790065
1.645886
## 86
=> {48} 0.02410336 0.7730084
1.617419
{39,89}
Sorting by confidence inspect(sort(retailrules,by = "confidence")[1:20]) ##
lhs
rhs
support
confidence lift
## 110 {110,39,48} => {38} 0.01169438 0.9942141
5.620153
## 116 {170,39,48} => {38} 0.01353191 0.9892206
5.591925
## 60
{110,39}
=> {38} 0.01973639 0.9891984
5.591800
## 72
{170,48}
=> {38} 0.01744516 0.9877970
5.583878
## 58
{110,48}
=> {38} 0.01543749 0.9862319
5.575030
## 74
{170,39}
=> {38} 0.02290102 0.9805731
5.543042
## 33
{170}
=> {38} 0.03437989 0.9780574
5.528821
## 19
{110}
=> {38} 0.03090901 0.9753042
5.513258
## 1
{37}
=> {38} 0.01186452 0.9739292
5.505485
## 113 {36,39,48}
=> {38} 0.01225018 0.9677419
5.470509
## 64
{36,48}
=> {38} 0.01542615 0.9604520
5.429300
## 66
{36,39}
=> {38} 0.02206166 0.9548355
5.397551
## 28
{36}
=> {38} 0.03164629 0.9502725
5.371757
## 2
{286}
=> {38} 0.01265852 0.9433643
5.332706
## 119 {38,41,48}
=> {39} 0.02258343 0.8386689
1.459077
## 105 {41,48}
=> {39} 0.08355074 0.8168108
1.421049
## 83
=> {39} 0.01587986 0.8064516
1.403027
## 123 {32,41,48}
=> {39} 0.01867018 0.7978672
1.388092
## 79
=> {39} 0.01527869 0.7960993
1.385016
=> {39} 0.01225018 0.7941176
1.381569
{225,48}
{310,48}
## 111 {36,38,48}
Wefi nddi ffer encebet weent heabov esetofr ul esaswefi r s ti ns pect edt her ul ess or t i ngbyt heor der of“ l i f t ” .Thought hev al ueofl i f ti shi gh,t heconfi dencei sl ow.I nt hes econds etofr ul eswec ans ee t hatt heConfi dencei sal mos t1whi cht el l st hatt hosepur c hasedi t em “ 110,39,48”defi ni t el y pur c has ed“ 38”andsoon.
Plotting the rules plot(retailrules)
Graph method: plot(head(sort(retailrules),10), method = "graph", control = list(type ="items"))
Grouped method: plot(head(sort(retailrules),10), method = "grouped")
Matrix method: plot(head(sort(retailrules),20), method = "matrix", measure = c("lift", "confidence"), control=list(reorder = T)) ## Itemsets in Antecedent (LHS) ##
[1] "{170}"
"{41}"
"{39,41}" "{32,39}" "{39}"
##
[8] "{41,48}" "{38,41}" "{38,48}" "{48}"
## Itemsets in Consequent (RHS) ## [1] "{39}" "{48}" "{41}" "{38}"
"{32}"
"{38}"
"{32,48}" "{39,48}" "{38,39}"
Double Decker: samplerule {38} 0.01169438 0.9942141
5.620153
plot(samplerule, method = "doubledecker", data = retail)
Looking at some interesting measures im {38} 0.02290102 0.9805731
5.543042
## 113 {36,39,48}
=> {38} 0.01225018 0.9677419
5.470509
## 66
=> {38} 0.02206166 0.9548355
5.397551
{36,39}
rules110 {38} 0.01973639 0.9891984
5.591800
## 58
{110,48}
=> {38} 0.01543749 0.9862319
5.575030
## 19
{110}
=> {38} 0.03090901 0.9753042
5.513258
## 108 {110,38,48} => {39} 0.01169438 0.7575312
1.317917
## 61
1.307336
{110,48}
=> {39} 0.01176244 0.7514493
Writing the rules to a CSV file and converting the rule set to a data frame write(retailrules, file = "Retail_Rules.csv", sep = ",", quote = TRUE, row.names = FALSE) retailrules.df {39} 0.33055058 0.6916340
1.2032726
## 56
{39}
=> {48} 0.33055058 0.5750765
1.2032726
## 54
{41}
=> {39} 0.12946621 0.7637337
1.3287082
## 50
{38}
=> {39} 0.11734080 0.6633111
1.1539977
## 53
{41}
=> {48} 0.10228897 0.6034125
1.2625621
## 52
{32}
=> {39} 0.09590300 0.5574603
0.9698434
## 51
{32}
=> {48} 0.09112770 0.5297026
1.1083338
## 49
{38}
=> {48} 0.09010685 0.5093614
1.0657723
## 105 {41,48} => {39} 0.08355074 0.8168108
1.4210493
## 106 {39,41} => {48} 0.08355074 0.6453478
1.3503063
## 107 {39,48} => {41} 0.08355074 0.2527623
1.4910695
## 97
1.3363513
{38,48} => {39} 0.06921349 0.7681269
Frequent Pattern Algorithm from SPMF # Rules generated by Apriori inspect(head(sort(retailrules, by = "lift"))) ##
lhs
rhs
support
confidence lift
## 110 {110,39,48} => {38} 0.01169438 0.9942141
5.620153
## 116 {170,39,48} => {38} 0.01353191 0.9892206
5.591925
## 60
{110,39}
=> {38} 0.01973639 0.9891984
5.591800
## 72
{170,48}
=> {38} 0.01744516 0.9877970
5.583878
## 58
{110,48}
=> {38} 0.01543749 0.9862319
5.575030
## 74
{170,39}
=> {38} 0.02290102 0.9805731
5.543042
# Rules generated by Frequent Pattern
##37 ==> 38 #SUP: 1046 #CONF: 0.9739292364990689 #LIFT: 5.505485339076103 ##110 ==> 38 #SUP: 2725 #CONF: 0.9753042233357194 #LIFT: 5.513257946763509 ##170 ==> 38 #SUP: 3031 #CONF: 0.9780574378831881 #LIFT: 5.528821482345322 ##39 110 ==> 38 #SUP: 1740 #CONF: 0.9891984081864695 #LIFT: 5.591799824476502 ##39 170 ==> 38 #SUP: 2019 #CONF: 0.9805730937348227 #LIFT: 5.543042131947258 ##48 110 ==> 38 #SUP: 1361 #CONF: 0.986231884057971 #LIFT: 5.575030479758838 ##48 170 ==> 38 #SUP: 1538 #CONF: 0.9877970456005138 #LIFT: 5.583878118378591 ##39 48 110 ==> 38 #SUP: 1031 #CONF: 0.9942140790742526 #LIFT: 5.62015270834472 ##39 48 170 ==> 38 #SUP: 1193 #CONF: 0.9892205638474295 #LIFT: 5.591925067319639
Items with least lift and confidence inspect(tail(sort(retailrules, by = "lift"))) ##
lhs
## 27 {413}
rhs
support
confidence lift
=> {39} 0.01281731 0.6010638
1.0457028
## 57 {110,38} => {48} 0.01543749 0.4994495
1.0450331
## 20 {110}
=> {48} 0.01565300 0.4939155
1.0334539
## 63 {36,38}
=> {48} 0.01542615 0.4874552
1.0199365
## 29 {36}
=> {48} 0.01606134 0.4822888
1.0091266
## 52 {32}
=> {39} 0.09590300 0.5574603
0.9698434
How the rules could be useful:
Fr om al l 3al gor i t hm i ti sdoubl yc onfi r medt hat“ 38”i st hei t em s ol dal ongwi t h“ 110,39,48, 170”wi t ht hel i f tv al ueof>5andc onfi dencec l os et o1. Consi derpr omot i onal act i v i t yf ort hei t emswhi c hhasl i f tv al ueofl esst han2....