Exam 2 August, questions PDF

Title	Exam 2 August, questions
Course	Data Analytics
Institution	University of Newcastle (Australia)
Pages	38
File Size	1.3 MB
File Type	PDF
Total Downloads	60
Total Views	154

Preview

CLICK TO PREVIEW PDF

Summary

Download Exam 2 August, questions PDF

Description

The University of Western Australia The Graduate School of Management

DATA ANALYSIS AND DECISION MAKING MGMT8504

Practice Substitution Test Student Name______________________________ Student No.________________

Time: 2 hours plus 10 minutes reading time Questions: This paper contains two parts: Part A contains 20 multiple-choice questions, each worth 1 mark. Total Part A marks: 20 Part B contains 8 case studies with short answer questions. Marks are identified within each question. Total Part B marks: 80 Total marks: 100 Answer: Please attempt to answer ALL questions. Write your answers in the space provided in this booklet. Pages: This booklet contains 38 pages including this page, statistical tables and solutions (please note solutions will not be provided with the actual test ☺). Special Conditions: This is a closed-book test with only your calculators (nonprogrammable), statistical tables and formulae page available for assistance. Please note that if you pass this test (50% or better) you will be able to substitute another elective unit in place of the Data Analysis and Decision Making MGMT8504 unit.

GOOD LUCK EVERYONE!!!

Page 1 of 38

PART A:

Multiple Choice questions (20 marks). Contains 20 questions, each worth 1 mark. Please attempt to answer ALL questions.

Circle the letter that corresponds to the most correct answer beside each of the following questions. A1.

The process of using sample statistics to draw conclusions about population parameters is called a) inferential statistics. b) experimentation. c) primary sources. d) descriptive statistics. e) the scientific method.

A2.

In analysing categorical data, the following graphical device is not appropriate a) Pie chart. b) Pareto diagram. c) Stem and leaf display. d) Bar chart. e) they are all appropriate.

A3.

The number of Singaporeans travelling to work by car today is an example of a) discrete numerical data. b) categorical data. c) continuous numerical data. d) discrete categorical data. e) continuous categorical data.

A4.

A summary measure that is computed from only a sample of the population is called a) A parameter. b) A census. c) A statistic. d) The scientific method. e) All of the above.

A5.

In a right skew distribution a) The median, mean and mode are all equal. b) The median and mode are both smaller than the mean. c) The median and mode are both larger than the mean. d) The distance between Q1 and the median and Q3 and the median is equal. e) None of the above.

Page 2 of 38

A6.

Which of the following statements about the median is false? a) It is a measure of a ”typical” value. b) It is equal to the mean in a ”bell-shaped” normal distribution. c) It is the average of the two middle values when an even number of data values are ranked. d) It is more affected by extreme values than the mean. e) It is equal to the mode in a ”bell-shaped” normal distribution. Questions A7 to A10 relate to the example below. The average waiting time at the drive thru’ section of a popular hamburger chain is 4 minutes with a standard deviation of 1 minute.

A7.

What is the probability that a customer would have to wait longer than 3.5 minutes? a) 0.5000 b) 0.2375 c) 0.7523 d) 0.3085 e) 0.6915

A8.

If a sample of 100 customers (n=100) is taken, what is the probability that the sample mean is less than 4.2 minutes? a) 0.0228 b) 0.4207 c) 0.5793 d) 0.9772 e) 1.0000

A9.

The same hamburger chain later surveys 1,000 customers to see if it should add beetroot to its Super Jumbo burger. The hamburger chain has 50 stores but, on the assumption that each store’s customers have similar tastes in hamburgers, randomly chooses only 10 of the stores to participate in the survey. Each store then samples 100 customers. What sampling method has the hamburger chain used? a) Cluster sampling b) Non-probability sampling c) Simple random sampling d) Stratified sampling e) Systematic sampling

A10.

The 1,000 customers sampled were also asked to estimate how much they spend per visit at this hamburger chain. From this sample data, it was found that on average customers spend $10.42 per visit with a standard deviation of $6.24. The correct 95% confidence interval for the mean of all customers spending per visit to this hamburger chain is: a) 8.11 to 12.43 b) 4.18 to 16.66 c) 10.03 to 10.81 d) 9.72 to 11.36 e) 9.91 to 10.93

Page 3 of 38

A11.

A group of researchers were attempting to determine whether female MBA graduates have a similar mean starting salary as male MBA graduates. What assumptions were necessary to conduct this hypothesis test? a) Both populations of salaries (male and female) must have approximate normal distributions. b) The population variances are approximately equal. c) The samples were randomly and independently selected. d) All of the above. e) (a) and (b) only. Questions A12 to A15 relate to the example below. The membership controller at the Royal Big Bucks Golf Club believes that each member, on average, plays golf for more than 12 hours per week. To test his theory, the controller took a random sample of 16 golfers and asked them how many hours a week they played golf. The data was as follows: 12, 15, 10, 22, 7, 16, 8, 18, 17, 14, 13, 14, 8, 24, 18, 16. The mean number of hours the sample of golfers play for is 14.5 with a standard deviation of 4.844.

A12.

State the null and alternative hypothesis to determine if the average number of hours played on the golf course is more than 12 hours per week. a) H0: μ ≤ 12 hours and H1: μ > 12 hours b) H0: μ ≥ 12 hours and H1: μ < 12 hours c) H0: μ = 12 hours and H1: μ ≠ 12 hours d) H0: x ≤ 12 hours and H1: x > 12 hours e) H0: x ≥ 12 hours and H1: x < 12 hours

A13.

To test the hypotheses the membership controller decided to construct a onetailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the appropriate t-critical value for a sample of size 16? a) 1.6450 b) 1.7459 c) 1.9600 d) 2.1315 e) 1.7531

A14.

To test the hypotheses the membership controller decided to construct a onetailed hypothesis test with a 5% significance level (i.e. α = 0.05). What is the appropriate t-statistic value? a) 2.32 b) 2.06 c) 1.96 d) 1.80 e) 2.41

Page 4 of 38

A15.

Based on a 0.05 alpha level, the membership controller‘s decision would be: a) Reject H0, p-value < α = 0.05 and +t-statistic > +t-critical. b) Reject H0, p-value > α = 0.05 and +t-statistic < +t-critical. c) Not reject H0, p-value < α = 0.05 and +t-statistic > +t-critical. d) Not reject H0, p-value > α = 0.05 and +t-statistic < +t-critical. e) Throw his hands in the air and go and practise his putting.

A16.

A retail manager wants to predict sales of umbrellas (Y) (in $000s). The manager uses a combination of the following variables: daily maximum temperature (X1), number of customers in the shopping centre where the store is located (X2), friendliness of staff (X3), and whether or not the manager is walking the shop floor (X4). Which of these variables is most likely to be the strongest predictor of umbrella sales for this retail store? a) X1 b) X2 c) X3 d) X4 e) All of these variables would have little effect on sales. Questions A17 to A18 relate to the example below. The manager of this retail stores decides to statistically analyse the relationship between sales of umbrellas (SALES) and the daily maximum temperature (MAXTEMP). The regression model is: SALES(predicted)=520 - 5*MAXTEMP. The coefficient of correlation (R) for a random sample of 20 days between maximum temperature and umbrella sales is -0.70.

A17.

Which of the following statements is not true? a) Daily maximum temperature is a good predictor of umbrella sales. b) A positive relationship exists between maximum temperature and umbrella sales. c) The coefficient of determination (R2) is 0.49. d) If the max. temperature is 220C, umbrella sales are predicted to be $410. e) If the max. temperature is 150C, umbrella sales are predicted to be $445.

A18.

Not satisfied with just considering the effect of daily maximum temperature on umbrella sales, the manager recorded the number of customers (CUSTOMERS) visiting the store, irrespective of whether the customer purchased an umbrella. The regression model is: SALES (predicted) = 472 - 4*MAXTEMP +3.5*CUSTOMERS The new coefficient of determination is 0.59. Which of the following statements is not true? a) ‘Umbrella sales’ is the dependent variable. b) Assuming the maximum temperature is kept constant, the average effect of each extra customer is to increase sales by $3.5. c) Assuming the number of customers is kept constant, the effect of a onedegree increase in maximum temperature is to increase sales by $4. d) 59% of the variation in umbrella sales is explained by the combined variation in maximum temperature and number of customers. e) Umbrella sales are negatively correlated with daily maximum temperature and positively correlated with the number of customers in the store.

Page 5 of 38

A19.

Use the model in Q3 to predict SALES on a day when MAXTEMP = 19C and CUSTOMERS = 220. a) 6671 b) 1166 c) 2352 d) 472 e) 2743

A20.

The annual multiplicative time-series model possesses which components? a) Cyclical, Seasonal and Trend only b) Trend and Seasonal only c) Irregular, Seasonal, Cyclical and Trend only d) Cyclical, Seasonal and Irregular only e) Trend, Cyclical and Irregular only

Page 6 of 38

PART B:

Case Study Questions (80 marks). Contains 8 case studies with short answer questions. Please attempt to answer all questions.

Case Study A (6 marks). Please attempt to answer ALL questions. The coaches of a local football team wanted to determine whether the players on their team were older than those of others teams. They recorded the ages of the players on two of the teams as follows: Team A: Team B:

25, 25, 28, 30, 25, 25, 24, 25, 22, 21, 19, 20, 22, 33, 28, 25, 26, 30, 28, 22, 24, 21 23, 23, 26, 27, 24, 20, 21, 23, 25, 21, 22, 29, 28, 29, 28, 27, 23, 23, 26, 32, 23, 20

A set of descriptive statistics was computed for both teams’ ages, along with adjacent fivenumber summaries (below) and box plots (next page). Team A Mean Standard Error Median Mode Standard Deviation Sample Variance Kurtosis Skewness Range Minimum Maximum Sum Count Confidence Level (95.0%)

Team B 24.909 Mean 0.756 Standard Error 25 Median 25 Mode 3.544 Standard Deviation 12.563 Sample Variance -0.112 Kurtosis XXXX Skewness 14 Range 19 Minimum 33 Maximum 548 Sum 22 Count 1.571 Confidence Level (95.0%)

24.682 0.694 23.5 23 3.257 10.608 -0.496 XXXX 12 20 32 543 22 1.444

Five-number Summary Team A Team B Minimum First Quartile Median Third Quartile Maximum

19 22 25

20 23 23.5

28 33

27 32

Page 7 of 38

Box Plots lots for Age 35

30

Ag e 25

Team B

20

Team A

15

Team

1. Given one of the aims of this analysis is to make inferences about the average age of all football players (there are 16 teams in the competition), what type of sampling scheme has been used here? Briefly explain. (2 marks)

2. For which team, Team A or Team B, is age more variable? Refer to and/or compute two different statistics to support your answer. (2 marks)

3. By using either descriptive statistics or box plots, comment on the skewness of both teams’ age distributions. (2 marks)

Page 8 of 38

Case Study B (6 marks). Please attempt to answer ALL questions. Lucky Pizzas, where your food is presented with a smile and a poem, delivers its wide assortment of pizzas in an average time of 24 minutes with a standard deviation of 6 minutes. The delivery times are approximately normally distributed. 1. The owner of Lucky Pizzas has promised that any household waiting more than 33 minutes to receive its order will get the pizza(s) free. What is the probability of receiving a free pizza? (1 mark)

2. If 200 orders were received in one night, how many would you expect to take between 15 and 33 minutes to be delivered? Show all workings. (2 marks)

3. On a night in which 100 deliveries were made, the average waiting time was 25 minutes. Calculate the probability the average delivery time exceeds 25 minutes. (2 marks)

4. Does your answer to (3) indicate that the average delivery time is now significantly greater than 24 minutes? Briefly explain (use α=0.05). (1 marks) i. Note: In effect, you are testing H0: µ ≤ 24 v H1: µ >24.

Page 9 of 38

Case Study C (9 marks). Please attempt to answer ALL questions. The growing use of bicycles to commute to work has caused many cities to create exclusive bicycle lanes. These lanes are usually created by disallowing parking on streets that formerly allowed curb-side parking. Shop-owners on such streets complain that the removal of parking will cause their businesses to suffer. To examine this problem a mayor of a large city decided to launch an experiment on one busy street that had one-hour parking meters. The meters were removed and a bicycle lane was created. The mayor asked three businesses (a drycleaner, a doughnut shop, and a convenience store) in one block to record daily sales for two complete weeks (Sunday to Saturday) prior to the change and two complete weeks after the change, the assumption being that the removal of parking bays would result in fewer sales. The data are presented below: Day

Drycleaner Sales Before Sunday 195 Monday 194 Tuesday 146 Wednesday 186 Thursday 178 Friday 146 Saturday 161 Sunday 190 Monday 162 Tuesday 154 Wednesday 153 Thursday 172 Friday 174 Saturday 141

Sales After 173 204 153 184 168 145 141 185 157 154 163 175 170 145

Doughnut Shop Sales Sales Before After 319 317 347 331 306 301 316 306 324 318 339 340 272 248 285 284 312 284 346 325 266 268 309 282 315 268 258 262

Convenience Store Sales Sales Before After 307 287 393 390 407 394 352 314 337 308 445 419 440 429 357 320 389 354 410 398 314 270 359 339 425 380 310 272

1. State the null hypothesis for the doughnut shop. Write the null hypothesis in words AND statistical notation. (2 marks)

2. State the alternative hypothesis for the doughnut shop. Write the alternative hypothesis in words AND statistical notation. (2 marks)

Page 10 of 38

Following is the EXCEL output. t-Test: Paired Two Sample for Means-- Drycleaners

Mean Variance Observations Pearson Correlation Hypothesized Mean Difference df t Stat P(T...