Class Notes - BABI 9040 PDF

Title	Class Notes - BABI 9040
Course	Analytic Models for Business Decisions
Institution	British Columbia Institute of Technology
Pages	24
File Size	474.5 KB
File Type	PDF
Total Downloads	51
Total Views	135

Preview

CLICK TO PREVIEW PDF

Summary

Download Class Notes - BABI 9040 PDF

Description

[email protected] https://workspace.bcit.ca/Citrix/WorkspaceWeb/ sob-babi.edu.bcit.ca babiodbc babi

 Class One March 2nd, 2020  Final Exam on Easter Monday - emailed to make alternate arrangements   ANALYTICS: the scientific process of transforming data into insights for making better decisions.  End to End Solution: 7 Step Process - Business question framing (question) - Analytic problem solving - Data - Methodology (approach) selection - Model building - Deployment - Life cycle management  T Shaped Professional - Ability to collaborate with various business functions and apply knowledge across disciplines + depth of expertise in analytics solutions   1. Business Problem Framing a. Obtain problem statement and usability requirements b. Identify stakeholders c. Determine if the problem is amenable to an analytics solution d. Refine the problem statement and delineate constraints e. Define an initial set of business benefits f. Obtain stakeholder agreement on the problem statement  2. Analytic problem solving a. Reformulate problem statement as an analytics problem b. Develop a proposed set of drivers and relationships to output c. State the set of assumptions related to the problem d. Define key metrics of success e. Obtain stakeholder agreement

 3. Data a. b. c. d. e. f.

Identify and prioritize data needs and sources Acquire data Harmonize , rescale, clean, and share data Identify relationships in the data Document and report findings *insights, results, business performance) Refine the business and analytics problem statements 

 4. Methodology a. Identify available problem solving approaches (methods) b. Select software tools c. Test approaches d. Select approaches  5. Model building a. Identify model structures b. Run and evaluate model (divide dataset ~ 70/30) c. Calibrate models and data d. Integrate the models e. Document and communicate findings (incl assumptions, limitations, constraints)  6. Deployment a. Perform business validation of the model b. Deliver a report with findings OR Create model, usability, and system requirements for production c. Deliver production model/ system d. Support deployment  7. Life cycle management a. Document initial structure b. Track model quality c. REcalibrate and maintain the model d. Support Training activities e. Evaluate the business benefit of the model over time  Stochastic Dominance: (?) higher $ for any given NPV    

  Intelligent Enterprise - Customer Relationship Management (CRM) has two discrete yet connected parts: Analytical and Operational CRM - To use a human analogy, Analytical CRM is the brain while Operational CRM is the nervous system of an organization  Brain (Analytical CRM) - Integrates data into information - Remembers past experiences - Makes predictions about the future based on past - Makes decisions and sets priorities - Segmentation is the main lens that drives understanding  Body (Operational CRM) - Takes action and gets results - Senses provide data about environment - Reacts to environment - Acts on directions / recommendation 

   Analytical CRM & Business Intelligence - People

-

Process Technology

 Customer Information - Data Warehouse - Marketing Database - Market Research - Promotion History - Profitability  Data Mining - Predictive Modeling - Segmentation - Market Basket Analysis - Customer Potential  Execution & Tracking - Campaign Management - Tracking & Measurement - Test and Learn Discipline - Sales Process & System 

  Lift Chart

(uplift modelling)  week one queries  PromoInfo query ---------  select d.month, p.promoType, count(distinct p.PromoAwardedID) as PromoCount, sum(p.promoamt) as PromoCost   from wk1BudgetPromoDetails p inner join dimdate d on d.DateKey = p.PromoAwardedDateKey  group by d.month, p.promoType  order by 1  -------------------  RevenueInfo query --------  select month, count(distinct r.customerid) as CustomerCount, sum(r.profit) as Revenue  from wk1BudgetRevSumMonth r  group by month  

Class Two March 9th, 2020   Review: End to End Solution: 7 Step Process Business question framing (question) Analytic problem solving Data Methodology (approach) selection Model building Deployment Life cycle management   Direct Mail Question: - What did it cost previously - increase? - Projected ROI - how does this catalogue affect sales? - Response rate - Profit margin  Average response rate = 2% 25% are likely to reorder within 6 months  Profit margin of clothes = 15% Average order = $80.00  100 # of catalogues  32 Cost   2 Orders generated  0.5 Reorders   200 Rev   30 Profit   Data Warehouses in Marketing → CRM  Peppers and Rogers - the One to One Future •Identify: get to know the customers of a company, to collect reliable data about their preferences and how their needs can best be satisfied. •Differentiate: distinguish the customers in terms of their lifetime value to the company, know

them by their priorities in terms of their needs and segment them into more restricted groups. •Interact: know by which communication channel and by what means of contact with the client is best made. Get the customer's attention by engaging with him in ways that are known as being the ones that he enjoys the most. •Customize: personalize the product or service to the customer individually. The knowledge that a company has about a customer needs to be put into practice and the information held has to be taken into account in order to be able to give the client exactly what he wan   The Loyalty Effect: - Long term, loyal customers are worth more than short term customers - Acquisition cost    CRM Analytics Driving Customer Action 1. Gather Customer Data 2. Derive Customer Insight 3. Customer Level Action 4. Evaluate Response   Data Warehouse Example FACTSALES: - Date - productID - customerID - ChannelID - Amount - Count DIMPRODUCT DIMCHANNEL DIMDATE DIMCUSTOMER  SQL: SELECT [DISTINCT] select_expr [, select_expr ...] [FROM table_references [WHERE where_condition] [GROUP BY {col_name | expr | position} [ASC | DESC], ... [WITH ROLLUP]] [HAVING where_condition]

[ORDER BY {col_name | expr | position} [ASC | DESC], ...] [LIMIT {[offset,] row_count | row_count OFFSET offset}]]   

    View: A view is a virtual table constructed for ease of use. - Kind of like a saved query - ‘A set of SQL in the background’  Case Statement - Like “if” on excel   ASSIGNMENT TWO: Due on Thursday - Select count (distinct[customerID]) as “customers”, [Sport] - FROM {babi...etc].[sumdaysports] - Where left(datekey,4) = 2014 - Group by [sport]  - Baseball = 9844

-

Basketball = 13547 etc  = 22871

 sum(table[order amount]/distinctcount(table[customerkey])   -   DECLARE @todaysdate datetime SET @todaysdate = '2014-01-29'  -- master query to build cust file select c.customerkey, g.EnglishCountryRegionName, st.SalesTerritoryCountry+'_'+st.SalesTerritoryRegion as SalesTerritory, d30_Orders, d30_SalesAmountUSD, d360_Orders, d360_SalesAmountUSD, case when Bike_orders > 0 then 'HasBikes' else 'NoBikes' end as HasBikes, Bike_Orders, Bike_SalesAmountUSD, Clothing_Orders, Clothing_SalesAmountUSD, datediff(d, max(d.date), @todaysdate) as DaysSinceLastOrder, max(d.Date) as LastOrderDate, count(distinct s.SalesOrderNumber) as Orders, sum(s.salesamount * cr.averagerate) as SalesUSD, sum((s.salesamount-s.totalproductcost)*cr.AverageRate) as ProfitUSD  from FactInternetSales s inner join dimcustomer c on c.customerkey = s.customerkey inner join dimdatealt d on d.datekey = s.OrderDateKey inner join dimgeography g on c.geographykey = g.geographykey inner join DimSalesTerritory st on s.SalesTerritoryKey = st.SalesTerritoryKey inner join FactCurrencyRate cr on s.CurrencyKey = cr.CurrencyKey and s.OrderDateKey = cr.DateKey -- 30 day query left outer join ( select CustomerKey, count(distinct SalesOrderNumber) as d30_Orders,

sum(salesamount * averagerate) as d30_SalesAmountUSD from FactInternetSales ss inner join FactCurrencyRate cr on ss.CurrencyKey = cr.CurrencyKey and OrderDateKey = DateKey where @todaysdate - OrderDate Predictive Model Files > Clothing_store.xlsx - Variable Explained: : Week Three > Predictive Model Files > Clothing Store Variables.xlsx - Build 3 Models predicting the variable RESP - Use train and test sets - Provide lift charts for each model - Provide a write up outlining the steps taken   Use pivot tables, charts, etc to try to understand the story behind it      Data Mining:     Classification: - Classify an observation into a category or class - Most widely used classifiers - Logistic regression - Linear discriminant analysis - K-nearest neighbours - Decision trees - Support vector machine - More computer intensive methods in week 4 - Random forests - Boosting - Support vector machines  https://web.stanford.edu/class/stats202/content/lec8-cond.pdf   Bayes Classifier: - Assigns each observation to the most likely class given predictor values - Assign an observation to the class for which the probability is the largest

P(Y=j|X=xo)  Logistic Regression: P(default = yes | balance) - ???  Linear Discrimination Analysis Reduces dimensions by maximizing separability between known categories - Alternative, less direct approach to estimating probability - Model distribution of predictors X separately in each response class Y - Use Bayes’ Theorem to flip into estimates for the conditional probability (P(Y=k | X=x) - Linear discrimination model is more stable in some situations: - When n is small and X is approximately normal in each class - When there are more than two response classes  K-Nearest Neighbours - In practice, you typically don’t know the conditional distribution of Y given X - Estimates the conditional distribution of Y given X, then classifies an observation to the class with the highest estimated probability - Given positive integer K, identifies the K points in the training data closest to observation xo, represented by Xo whose response values equal j - Then applies Bayes rule and classifies the observation to the class with the largest probability   “Separating data from noise”   Confusion Matrix - Predicted default status vs true default status  Sensitivity & Specificity - Class specific performance - Sensitivity - % of true positives identified - True positives/ (true positives + false negatives) - Specificity - % of true negatives identified - True negatives / (true negatives + false positives)  Threshold Parameter - Assign observation to default class if P(default = yes | balance) > 0.5

- So threshold 50% If we want to capture more individuals who have change to default, we can lower the threshold P(default = yes | balance) > 0.2   ROC Curve - Displays the two types of error for all possible thresholds - Comes from communications theory - Receiver operating characteristics - Overall performance of a classifier is given by the area under the ROC curve  Y = Sensitivity (true positive rate), X = 1 - specificity (false positive rate)  Lift Chart - Assess the ability of a model to detect events with two classes - Ranks samples by their scores and determines the cumulative event rate as more samples are evaluated - Lift is the number of samples detected by a model above a completely random selection of samples - Plots the cumulative lift against the cumulative % of samples that have been screened   Summary: use ROC curve to compare different models, and then lift chart to decide which customers to focus on.   Tree Based Methods - Segments predictor space into a number of simple regions - Simple, useful for interpretation, typically not as competitive - bagging , boosting random forests - Combine multiple trees to get a consensus prediction - Advantages: - Closely mirror human decision making process - Easily handle qualitative predictors without the need for dummy variables - Disadvantages: - Do not have the same level of predictive accuracy - Unstable - small change in data can cause large change in final estimated tree  - Typically more useful for non linear models - Bagging: average a set of observations to reduce variance - Build many training subsets from a population, build a separate prediction model with each set, and average the resulting predictions -

- No longer able to visualize the tree and show importance of variables Boosting: - Use multiple training subsets, but trees are grown sequentially - Each tree is grown using information from previously grown trees - Learn slowly  - Fit small tree to data -> fit another tree to residuals from the model -> add the new decision tree into fitted function to update the residuals -> repeat  - Random Forests: improvement over bagged tree with tweak that decorrelates the tree - Build a number of decision trees on multiple training samples - Each time a sample of m predictors is chosen it split candidates from the full set of p predictors - So, at each split, the model is not allowed to consider a majority of available predictors - Helps prevent overfitting  Support Vector Machines (SVM) - Developed in the computer science community - Performs well in a variety of settings, considered one of the best out of the box classifiers - Generalization of maximal margin classifier - Requires classes be separated by a linear boundary - Extension to accommodate non-linear class boundaries -

  THURSDAY LECTURE:     Monday, March 30th, 2020 Week 5, Lecture 1 

Economics of Strategy  Strategy: - A set of decisions about how an organization mobilizes/ allocates resources - The economics of strategy provides a useful framework to understand and analyze the underlying decisions  Goal of Strategy: - Private → raise revenue and cut cost = profit maximization

-

( Price - Cost ) * Quantity  Public → do more with less = value maximization ( Quality - Cost ) * Impact

 How to increase profits - Sell more units - Increase profit margin  A firm must create value before it can capture it by Increasing benefits (WTP) - Reduce costs - Expand quantity - New markets - Exposure - Increase usage - Speed up production  Value Creation and Capture - Creating value does not guarantee profits - It must capture value through its charged price - Fail if: - competition drives down prices - Value created does not benefit the end user (paying customer) - Increase captured value by - Becoming irreplaceable - Firms often cannot tell which customers have higher or lower WTP - Customer segmentation - Different product lines that appeal to different group  Key Performance Indicators - A set of measurable values to reflect how well an organization is executing its strategy (decisions) - Effective: doing right things - Efficient: doing things right - Identify what matters, measure it, manage it  - Integral part of any performance management tool  Two Types: - Result Indicators - Is the organization moving in the right direction?

-

- Do not specify which actions have resulted in success or failure - I.e. profit, profit margin, market share, customer satisfaction Performance Indicators - Measure the actions and events leading to a result - Hold team accountable, give actionable information to improve - Non-financial only - I.e. turn around time, # of complaints

 Implementing KPIs - Partnership - Recognition by stakeholders; joint development - Power transfer to front line - Enable staff to take immediate action to rectify situations negatively impacting KPIs - Link performance to strategy - Strategy -> critical success factors -> indicators - Trust & Culture - Not primarily a reward or penalty mechanism - Fear of measurement: misuse of performance management tool - Manage poor performance: help them improve or fire them   Balanced Scorecard - A performance management framework to keep track of the execution of activities - Comprises both financial and non-financial perspectives to get a balanced view of performance - Classic BSC by Kaplan and Norton (1996): - Strategy map, scorecard of measures, targets, and initiatives



Web Analytics - Site Traffic - Geography, technology, new vs returning - Referring site - Demographics - Search – organic and paid ads - Understanding what search words drove traffic to your site - Understanding what Pay per Click (PPC) is working at what price - Site Content and Usage - What pages are viewed the most - Site speed - Click maps and in page analytics - Finding sources of error  → Sankey Chart: how traffic flows through site → Conversion Funnels → Web Optimization    Final Project: - Data set - Use Power BI  Exam: 1. SQL 2. Scenario question (logic → explanation) 3. Power Pivot + Power BI  Three hours - due on the 20th. We’ll have one week.   Creating a Dashboard that can quickly switch between different indicators 1. Clean and rename columns so they are all the same 2. Upload entire folder to Power BI 3. Make sure all the data is there 4. Open up ‘model’ view 5. Click ‘enter data’ 6. Count the number of measures you’re going to want to include 7. Name them them 1,2,3,etc the different measures you want to include 8. Create a new measure

9.

Metric Selected = SelectedValue('Metric Selectin'[Metric], "Score")

10. Selected Metric = SWITCH(TRUE(),'Metric Selection'[Metric Selected] = "Score", AVERAGE('world-happiness'[Score]),'Metric Selection'[Metric Selected] = "Life Expectancy", AVERAGE('world-happiness'[Healthy life expectancy]),'Metric Selection'[Metric Selected] = "Generosity", AVERAGE('world-happiness'[Generosity]),'Metric Selection'[Metric Selected] = "GDP per capita", AVERAGE('world-happiness'[GDP per capita]),'Metric Selection'[Metric Selected] = "Freedom", AVERAGE('world-happiness'[Freedom to make life choices]),'Metric Selection'[Metric Selected] = "Corruption Perceptions", AVERAGE('world-happiness'[Perceptions of corruption]),AVERAGE('world-happiness'[Score]))

    Example 2: - Create a new table called “Key Measures” - Create new measures: - Total Gross Sales = SUMX(financials, financials[unitssold]*financials[saleprice]) - Total Sales = SUMX(financials, (financials[saleprice]*financials[unitssold])-financials[discounts]) - Total Sales v2 = SUMX(financials, [Total Gross Sales] - financials[discounts]) - Net Profit =SUMX(financials, (financials[salesprice]*financials[unitssold]) - financials[discounts]-financials[COGS]) - % Profit Margin = DIVIDE([Net Profit]/[Total Sales])  - Govt Net Profit = SUMX(FILTER(financials, financials[segment]=”Government”), [Total Sales]-financials[COGS]) - Net Profit Chnl Prtnr = Calculate([Net Profit], financials[Segment]=”Channel Partners”) DIDN’T WORK???? - Govt COGS = VAR GovtProfit = FILTER(ALL(financials[Segment]), financials[Segment]=”Government” RETURN CALCULATE(SUM([COGS], GovtProfit)  -

Cumulative Sales = CALCULATE([Total Sales], FILTER( ALLSELECTED(financials[date]), Financials [date]...