GB 306 - Lecture notes 1 PDF

Title GB 306 - Lecture notes 1
Author Peter Magner
Course Business Analytics I
Institution University of Wisconsin-Madison
Pages 11
File Size 94.7 KB
File Type PDF
Total Downloads 74
Total Views 131

Summary

Prof: Richard Crabb...


Description

Session 7: Probability 9/26/18 Anything with a ‘ after it means ‘not that’. It's the same thing as something with the horizontal bar over it. All of them are intentional. None are typos Complement rule: P(A’) = 1 - P(A) Conditional corollary: P(A’ l B) is 1 - P(A l B) Addition Rule: P(A or B) = P(A) + P(B) - P(A and B) Mutual exclusion: P(A and B)= 0, P(A l B)= 0, P(B l A)= 0 Independence rule: P(A and B)= P(A) * P(B) P(A l B)= P(A), P(B l A)= P(B) Multiplication rule: P(A and B)= (P(A) * P(B l A) Events A and B are independent, and P (not A l B)= 65% True: P( A l B)= 35% 0% < P(B) > 2-0 so u win series P(win next game)= 1/2 Lose next game= ½ You have 75% of winning $10 pot bc you won game 1 so you take $7.50. Friend 2 gets $2.50 Win next game Lost next game Third game. Win chance is ½ Lose then win= 1/4 outcomes Win game 2= 1/2 Lose game 2 then win game 3= ¼ Lose game 2 then lose game 3= ¼ What's probability an athlete uses drugs if they test positive? P(D)= .25 P(D’)= .75 P(T+ l D’)= .03, P(T- l D’)= .97 P(T- l D)= .07, P(T+ l D)= .93 P(D l T)= .62 (from prev lecture) P(T)= P(D) * P(T l D) + P(D’)* P(T l D’) = .25 * .93 + .75 * .03 = .255 P(T l D) * P(D) / P(T) = (.93 * .25)/ .255= .912

Medical Tests Disease is 1 in 1000 people Test is 99% accurate You test positive. What's the chance u have disease? 99% 90% 9% (CORRECT) 1% False positives outweigh true positives 100K tested

100 expected to have it , but 99 truly have it accordance to testing. 99 were caught, but 1 wasn't 99,900 don't have it. 1% have their tests be wrong so that's 999 people 99/(99+999)= .09

P(ski)= .5 P(board)= .4 P(ski l board)= .2 P(ski or board)= P(ski) + P(board) - P(ski and board) = .5 + .4 - ? P(ski and board)= P(B) * (P(S l B) .4 * .2 = .08 .9 - .08= .82 For bayes rule, P(D l T+) doesn't equal P(T+ l D) Big Data What is it? extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.

When did it become a big deal? Around 2001 What countries and most interested in it? Everyone. Early leaders were US and UK Comes from… Everything u do Revenue and spending on big data Estimated revs of 150.8 B in 2017 Double digit rev growth observed and expected to continue into next decade Spending in excess of 72 B The 3 V’s of big data Volume : data size All the data in world between beginning of time and 2008 = amt of data generated every minute today

Velocity: speed of data generation Businesses need to respond with high speeds Everything is in real time Variety: different forms of data 80% of world’s data is unstructured like text and media Some add a 4th V - Veracity Veracity is the idea is that there's a lot of garbage data out there. Poor quality and not reliable. Uncertainty due to data inconsistency and incompleteness Our working definition of big data: Big data applies to info that is unusually high in volume, velocity, or variety Big data claims 1. Data analysis produces incredibly accurate results But the results can be false positives. If you look at enough data, you’ll always find interactions solely by chance (like correlations that are just chance) 2. Since every data point is captured, statistical sampling techniques aren't necessary But do you have unbiased data? More data is not a cure if you're using biased data methodology to collect data 3. Statistical correlation tells us all we need to know But causation matters. The world isn't static, correlations may change over time Ex. prior to 2007, bankruptcy was highly correlated with future losses But after 2007 bankruptcy become uncorrelated with future losses 4. With enough data, the numbers speak for themselves But spurious correlations and patterns exist by chance, and the more data, the more often these false relationships will appear Ex. Google Flu tracking Idea- use search terms to monitor health of regional population User behavior changes over time, so search terms and frequency laso change Google’s search algorithm changes often Media reports impact searches- more news coverage means more searches So the correlations broke Model worked really well for short time, but then correlations changed Big data can offer insights, but there are big challenges Correlation does not imply causality Large sample sizes can still be problematic Hard to analyze unstructured data (esp. video) Concerns about privacy...


Similar Free PDFs