Data mining - market basket analysis for retail point of sale data PDF

Title Data mining - market basket analysis for retail point of sale data
Author Prateek Mehta
Course Digital Marketing
Institution Institute of Business Management
Pages 16
File Size 537.5 KB
File Type PDF
Total Downloads 76
Total Views 139

Summary

market basket analysis for retail point of sale data...


Description

Data Mining Overview Theex t r ac t i onofhi ddenpr edi c t i v ei nf or mat i onf r om l ar gedat abas es ,i sapower f ulnewt ec hnol ogy wi t hgr eatpot ent i alt ohel pc ompani esf ocusont hemos ti mpor t anti nf or mat i oni nt hei rdat a war ehouses .Dat ami ni ngt ool spr edi c tf ut ur et r endsandbehav i or s ,al l owi ngbus i nes s est omak e pr oact i v e,knowl edgedr i v endec i s i ons .Theaut omat ed,pr os pec t i v eanal y sesoffer edbydat ami ni ng mov ebey ondt heanal y sesofpas tev ent spr ovi dedbyr et r os pec t i v et ool st ypi c al ofdec i s i onsuppor t s y s t ems .Dat ami ni ngt ool sc anans werbusi nessques t i onst hatt r adi t i onal l ywer et oot i me c onsumi ngt or es ol v e.Theys c ourdat abas esf orhi ddenpat t er ns ,findi ngpr edi ct i v ei nf or mat i ont hat ex per t sma ymi s sbecausei tl i esout s i det hei rex pect at i ons .Bel owi st heEnt i t yr el at i ons hi pdi agr am depi c t i ngt her el at i onshi poft hedat as har edf ort hepr oj ec t .

Challenges faced: 1. Thedat awasbr ok eni nt o3s et s .1.Poi ntofSal edat a,2.Cus t omerdat aand3.Pr i c i ngdat a. ThePoSdat as ethadt her ec ei ptno.andar t i c l espur c hasedi neac hr ecei ptbyt hec us t omer . Thec ompl ex i t ywast hedat as etwasv er yl ar get obemas s agedi nex celasi twasmor et han 3, 367, 020obj ec t sof3v ar i abl es .Thi shaduni quer ecei ptno.f oreac hSKUsbought . 2. Summar i z at i onoft hi sdat ac oul dn’ tbes uppor t edonEx c elorAcc ess .Wet r i eddoi ngt hi sbut wer el i mi t edbyt hes i z eofsuppor t eddat as eti nEx c el orAc c ess . 3. Wi t hi nt hedat as eti twasr equi r edt os ummar i z eal l uni queSKUsonperr ecei ptbas i s .Whi c h meanst hatt heSKUsneededt obeuni quei dent i fier s . 4. TheSKUsl i s t i ngi nc hr onol ogi cal or derwasofi mpor t anc eonwhatgotpur chas edfi r s tand whatf ol l owedorpr eceded.Thi swasofi mpor t anc easi tenabl esbet t ermer c handi z i ng pl anni ng. 5. Theas s oci at i onr ul emi ni ngwasdoneusi ngApr i or ial gor i t hm andt hemaj orc hal l engewast o dec i deont her i ghtbal anceofMi ni mum l engt h,Suppor tandConfi denc e.I twasdi ffic ul tt o ar r i v eatac ombi nat i onofr i ghtmi xofLHSandRHSwi t hr equi r eds uppor tandc onfidenc e. Weneedt oes t abl i sht her i ghtl engt ht ohav er i ghtno.ofar t i c l esi nar ecei pt( nott oomany andnott ool es sbasedonav er ageSKUsperr ec ei pt ) 6. Anot herc hal l engewast oes t abl i sht heRHSas soc i at i ont oi dent i f ynotonl yt hemos tr unni ng ar t i c l ebutal sot hel eas tmov i ngar t i cl e.Thel eas tmov i ngar t i cl ewast obei dent i fi edt o s ur f acet her i ghtmi xofar t i c l eswi t hwhi c hapr omot i oncoul dbepl anned.

Market Basket Analysis Mar k etBas k etAnal y s i suncov er sass oc i at i onsbet weenpr oduct sbyl ook i ngf orc ombi nat i onsof pr oduct st hatf r equent l yc oocc uri nt r ans ac t i ons .I tal l owst hes uper mar k et st oi dent i f yr el at i onshi ps bet weent hepr oduc t st hatc us t omerbuyf orv ar i ouspur pos es .

Retail Market Basket Data Set Thi spr oj ectanal y s est her et ai l mar k etbas k etdat asets uppl i edbyaanony mousBel gi anr et ai l s uper mar k ets t or e.Thedat aar ec ol l ec t edov ert hr eenonc ons ecut i v eper i ods .Thi sr es ul t si n appr oxi mat el y5mont hsofdat a.Thet ot alamountofr ec ei pt sbei ngc ol l ect edequal s88, 162.Ov er t heent i r edat ac ol l ect i onper i od,t hes uper mar k ets t or ec ar r i es16, 470uni queSKU’ s( St ockKeepi ng Uni t s ) .I nt ot al ,5, 133c us t omer shav epur c has edatl eas tonepr oducti nt hesuper mar k etdur i ngt he dat ac ol l ect i onper i od.

Project Approach 

Gr oupi ngpr oduct st hatcoocc uri nt hedes i gnofas t or e’ sl ay outt oi ncr eas et hec hanceof c r os s sel l i ng.Fort hi spur pos e,wewoul dbeus i ngApr i or i ,Ec l atandFr equentPat t er n al gor i t hm t os t udyt hecus t omerbehavi our .



T ar get i ngmar k et i ngc ampai gnsbys endi ngoutpr omot i onaloffer st oc us t omer sr el at edt o pr oductt heypur chas ed.

Algorithms and Packages used + arules + arulesViz + eclat + Frequent Pattern growth

+ bigmemory

Data Statistics retail {38} 0.01973639 0.9891984

5.591800

## 72

{170,48}

=> {38} 0.01744516 0.9877970

5.583878

## 58

{110,48}

=> {38} 0.01543749 0.9862319

5.575030

## 74

{170,39}

=> {38} 0.02290102 0.9805731

5.543042

## 33

{170}

=> {38} 0.03437989 0.9780574

5.528821

## 19

{110}

=> {38} 0.03090901 0.9753042

5.513258

## 1

{37}

=> {38} 0.01186452 0.9739292

5.505485

## 113 {36,39,48}

=> {38} 0.01225018 0.9677419

5.470509

## 64

{36,48}

=> {38} 0.01542615 0.9604520

5.429300

## 66

{36,39}

=> {38} 0.02206166 0.9548355

5.397551

## 28

{36}

=> {38} 0.03164629 0.9502725

5.371757

## 2

{286}

=> {38} 0.01265852 0.9433643

5.332706

## 121 {38,39,48}

=> {41} 0.02258343 0.3262865

1.924795

## 125 {32,39,48}

=> {41} 0.01867018 0.3047020

1.797466

## 92

{38,48}

=> {41} 0.02692770 0.2988419

1.762897

## 95

{38,39}

=> {41} 0.03460675 0.2949251

1.739792

## 102 {32,39}

=> {41} 0.02675756 0.2790065

1.645886

## 86

=> {48} 0.02410336 0.7730084

1.617419

{39,89}

Sorting by confidence inspect(sort(retailrules,by = "confidence")[1:20]) ##

lhs

rhs

support

confidence lift

## 110 {110,39,48} => {38} 0.01169438 0.9942141

5.620153

## 116 {170,39,48} => {38} 0.01353191 0.9892206

5.591925

## 60

{110,39}

=> {38} 0.01973639 0.9891984

5.591800

## 72

{170,48}

=> {38} 0.01744516 0.9877970

5.583878

## 58

{110,48}

=> {38} 0.01543749 0.9862319

5.575030

## 74

{170,39}

=> {38} 0.02290102 0.9805731

5.543042

## 33

{170}

=> {38} 0.03437989 0.9780574

5.528821

## 19

{110}

=> {38} 0.03090901 0.9753042

5.513258

## 1

{37}

=> {38} 0.01186452 0.9739292

5.505485

## 113 {36,39,48}

=> {38} 0.01225018 0.9677419

5.470509

## 64

{36,48}

=> {38} 0.01542615 0.9604520

5.429300

## 66

{36,39}

=> {38} 0.02206166 0.9548355

5.397551

## 28

{36}

=> {38} 0.03164629 0.9502725

5.371757

## 2

{286}

=> {38} 0.01265852 0.9433643

5.332706

## 119 {38,41,48}

=> {39} 0.02258343 0.8386689

1.459077

## 105 {41,48}

=> {39} 0.08355074 0.8168108

1.421049

## 83

=> {39} 0.01587986 0.8064516

1.403027

## 123 {32,41,48}

=> {39} 0.01867018 0.7978672

1.388092

## 79

=> {39} 0.01527869 0.7960993

1.385016

=> {39} 0.01225018 0.7941176

1.381569

{225,48}

{310,48}

## 111 {36,38,48}

Wefi nddi ffer encebet weent heabov esetofr ul esaswefi r s ti ns pect edt her ul ess or t i ngbyt heor der of“ l i f t ” .Thought hev al ueofl i f ti shi gh,t heconfi dencei sl ow.I nt hes econds etofr ul eswec ans ee t hatt heConfi dencei sal mos t1whi cht el l st hatt hosepur c hasedi t em “ 110,39,48”defi ni t el y pur c has ed“ 38”andsoon.

Plotting the rules plot(retailrules)

Graph method: plot(head(sort(retailrules),10), method = "graph", control = list(type ="items"))

Grouped method: plot(head(sort(retailrules),10), method = "grouped")

Matrix method: plot(head(sort(retailrules),20), method = "matrix", measure = c("lift", "confidence"), control=list(reorder = T)) ## Itemsets in Antecedent (LHS) ##

[1] "{170}"

"{41}"

"{39,41}" "{32,39}" "{39}"

##

[8] "{41,48}" "{38,41}" "{38,48}" "{48}"

## Itemsets in Consequent (RHS) ## [1] "{39}" "{48}" "{41}" "{38}"

"{32}"

"{38}"

"{32,48}" "{39,48}" "{38,39}"

Double Decker: samplerule {38} 0.01169438 0.9942141

5.620153

plot(samplerule, method = "doubledecker", data = retail)

Looking at some interesting measures im {38} 0.02290102 0.9805731

5.543042

## 113 {36,39,48}

=> {38} 0.01225018 0.9677419

5.470509

## 66

=> {38} 0.02206166 0.9548355

5.397551

{36,39}

rules110 {38} 0.01973639 0.9891984

5.591800

## 58

{110,48}

=> {38} 0.01543749 0.9862319

5.575030

## 19

{110}

=> {38} 0.03090901 0.9753042

5.513258

## 108 {110,38,48} => {39} 0.01169438 0.7575312

1.317917

## 61

1.307336

{110,48}

=> {39} 0.01176244 0.7514493

Writing the rules to a CSV file and converting the rule set to a data frame write(retailrules, file = "Retail_Rules.csv", sep = ",", quote = TRUE, row.names = FALSE) retailrules.df {39} 0.33055058 0.6916340

1.2032726

## 56

{39}

=> {48} 0.33055058 0.5750765

1.2032726

## 54

{41}

=> {39} 0.12946621 0.7637337

1.3287082

## 50

{38}

=> {39} 0.11734080 0.6633111

1.1539977

## 53

{41}

=> {48} 0.10228897 0.6034125

1.2625621

## 52

{32}

=> {39} 0.09590300 0.5574603

0.9698434

## 51

{32}

=> {48} 0.09112770 0.5297026

1.1083338

## 49

{38}

=> {48} 0.09010685 0.5093614

1.0657723

## 105 {41,48} => {39} 0.08355074 0.8168108

1.4210493

## 106 {39,41} => {48} 0.08355074 0.6453478

1.3503063

## 107 {39,48} => {41} 0.08355074 0.2527623

1.4910695

## 97

1.3363513

{38,48} => {39} 0.06921349 0.7681269

Frequent Pattern Algorithm from SPMF # Rules generated by Apriori inspect(head(sort(retailrules, by = "lift"))) ##

lhs

rhs

support

confidence lift

## 110 {110,39,48} => {38} 0.01169438 0.9942141

5.620153

## 116 {170,39,48} => {38} 0.01353191 0.9892206

5.591925

## 60

{110,39}

=> {38} 0.01973639 0.9891984

5.591800

## 72

{170,48}

=> {38} 0.01744516 0.9877970

5.583878

## 58

{110,48}

=> {38} 0.01543749 0.9862319

5.575030

## 74

{170,39}

=> {38} 0.02290102 0.9805731

5.543042

# Rules generated by Frequent Pattern

##37 ==> 38 #SUP: 1046 #CONF: 0.9739292364990689 #LIFT: 5.505485339076103 ##110 ==> 38 #SUP: 2725 #CONF: 0.9753042233357194 #LIFT: 5.513257946763509 ##170 ==> 38 #SUP: 3031 #CONF: 0.9780574378831881 #LIFT: 5.528821482345322 ##39 110 ==> 38 #SUP: 1740 #CONF: 0.9891984081864695 #LIFT: 5.591799824476502 ##39 170 ==> 38 #SUP: 2019 #CONF: 0.9805730937348227 #LIFT: 5.543042131947258 ##48 110 ==> 38 #SUP: 1361 #CONF: 0.986231884057971 #LIFT: 5.575030479758838 ##48 170 ==> 38 #SUP: 1538 #CONF: 0.9877970456005138 #LIFT: 5.583878118378591 ##39 48 110 ==> 38 #SUP: 1031 #CONF: 0.9942140790742526 #LIFT: 5.62015270834472 ##39 48 170 ==> 38 #SUP: 1193 #CONF: 0.9892205638474295 #LIFT: 5.591925067319639

Items with least lift and confidence inspect(tail(sort(retailrules, by = "lift"))) ##

lhs

## 27 {413}

rhs

support

confidence lift

=> {39} 0.01281731 0.6010638

1.0457028

## 57 {110,38} => {48} 0.01543749 0.4994495

1.0450331

## 20 {110}

=> {48} 0.01565300 0.4939155

1.0334539

## 63 {36,38}

=> {48} 0.01542615 0.4874552

1.0199365

## 29 {36}

=> {48} 0.01606134 0.4822888

1.0091266

## 52 {32}

=> {39} 0.09590300 0.5574603

0.9698434

How the rules could be useful:  

Fr om al l 3al gor i t hm i ti sdoubl yc onfi r medt hat“ 38”i st hei t em s ol dal ongwi t h“ 110,39,48, 170”wi t ht hel i f tv al ueof>5andc onfi dencec l os et o1. Consi derpr omot i onal act i v i t yf ort hei t emswhi c hhasl i f tv al ueofl esst han2....


Similar Free PDFs