MGMT 305 - Chap 1 Homework PDF

Title MGMT 305 - Chap 1 Homework
Course Business Statistics
Institution Purdue University
Pages 21
File Size 907.4 KB
File Type PDF
Total Downloads 104
Total Views 155

Summary

DR. GARY EVANS
Use these homework packets to make sure you are on track and understanding the material. These were completed using minitab and the lecture notes given in class!...


Description

1

CHAPTER 1 EXERCISES (Note: For all exercises in all chapters, when asked to confirm results or create something with Minitab, the Minitab output must be copied and pasted into your paper. Remember, you should do any requested by-hand calculation to make sure that you know how to, but these do not have to be in your homework paper.)

1. For each of the variables below, compute by hand (that is, with a calculator) their mean and standard deviation. Confirm your results with Minitab. (Type the numbers into separate columns in the Minitab worksheet.) What relationships do you see? a. 100, 200, 300, 400, 500, 600, 700, 800, 900

b. 99, 199, 299, 399, 499, 599, 699, 799, 899

c. 10, 20, 30, 40, 50, 60, 70, 80, 90

d. 110, 220. 330. 440. 550, 660. 770. 880, 990

1

2

Relationship: Parts a and b have the same standard deviation and increase by increments of 100. Part c has values equal to those of (part a divided by 10). As such, the mean and standard deviation are equal to those of (part a divided by 10). Part d increases in increments of 110 and thus has a larger mean and standard deviation than the other parts. 2. a. For 2019-20, the minimum NBA player salary (for rookies) was $898,000. The mean salary was approximately $6.4 million, the median $2.8 million, and, at the time of this writing, the maximum salary was $40.2 million (Steph Curry). Do you believe the distribution of 2019-20 NBA player salaries is normal? Briefly explain your answer. Answer: It is not a normal distribution because the mean and median are not equivalent. In a normal distribution, the probability distribution is symmetric about the mean, which is not the case here. b. Consider the distribution of all household incomes in the state of Indiana. Briefly explain why you believe this is not a normal distribution. Answer: I would find it very unlikely that the probability distribution would be symmetric about the mean, and I would also assume that outliers would be present, which is an indicator of a distribution that is not “normal”. 3. Recently, ESPN reported that the average salary of an NFL place-kicker is $1.9 million, while the average salary for an NFL running back is only $1.7 million. This statistic was cited as evidence that, in the current NFL, running backs are considered less valuable than place-kickers. The claim being made based on this statistic is 2

3

misleading. In what way? (Hint: Teams have 1 place-kicker and, usually, 5 or 6 running backs, one of whom is their lead back.) Answer: This is misleading because the average salary of the 5 or 6 running backs on a team could be lower due to some of the RB’s being primarily backup players who make much less than the starters. On the flipside, with their being only one place kicker, he would get more playtime and thus be paid more. 4. a. Open the “How Much is a Win Worth” document. Evaluate this graph (which is from the Spring 2018 Exponent). Give two criticisms of the graph. 1) There seems to be no scale for the size of the circles. The 2million dollar and 3million dollar circles look nearly the same while the $154,837 is more than double the size of the $106,645 circle. 2) The graph does not answer the question “how much is a win worth” because it features coaches from all types of sports. It would accomplish its job better if all the coaches were from the same sport so that we could compare their salaries to their win percentages without the introduction of this third variable. b. Open the UK 2016 Q4 GDP Growth graph. What is a major flaw in its construction? (Hint: Is the Second Estimate of the GDP growth three times bigger than the First Estimate?) The scale of the graph is off due to it starting at 0.55%. Because of its design, the second estimate appears to be 3 times larger than the first, while it is truly only 0.10% more. 5. Suppose the Pacers basketball team is averaging 108 points per game half way through their season.

3

4

a. Is this a good prediction of exactly how many points they will score in their next game? Why or why not? Answer: It is not a very good prediction because there are way too many variables at play, and an average can be generated from any group of numbers, even if the game scores are bowhere near consistent. b. To make such a prediction, what other information would you like to have? Answer: We would need to see several other things in order to make a more accurate prediction including full 6 number summary and standard deviation, as well as the stats and rankings of the teams they have played so far and the team they will play next game. It is also important to know if any players are or have been injured 6. Tracy, an Indiana high school student, is considering applying to Purdue but notices the 16% admission rate for all applicants. What conditional probabilities should Tracy look at to get a better picture of the applicable acceptance rate? (Give two.) Answer: Tracy should look into the probability of being admitted given that the student is female, as well as the probability of being admitted given that the student is from Indiana. 7. Read the two documents in the Washington State Climate Change Results file. Briefly explain how response bias could account for their inconsistency. Answer: In the public survey, it is likely that the people gave what they believed to be the correct answer, while in the secret vote, they gave their true answer. Open the Wine Quality dataset. Questions 8 - 13 concern this dataset.

4

5

8. Characterize the variables as either categorical or numerical. Write two questions you would try to answer about these wines using this dataset. Answer: All of the variables are numerical with the exception of the “producer” 1) Do wines with higher residual sugars have a higher quality rating? 2) Which producer uses the highest alcohol content in their wine? 9. Use Minitab to construct a dotplot of the Quality Rating variable in the dataset.

10. Wines are typically rated by connoisseurs on a scale of 1 – 10 (10 for best). These three wine producers all produce wines of all types, including many 9 and 10 rated wines of the highest quality. Given this, what does the dotplot in Question 8 tell you

5

6

about the group of wines in this sample? (Hint: Is this a random sample of all of their wines?) Answer: This sample only contains wines of ratings 4-7, implying that the sample is not random. 11. Use Minitab to compute a 6-number summary of the Residual Sugar variable. Compare the mean and the median.

a. What does this comparison suggest about the shape of the distribution of this variable? Answer: The comparison suggests that the distribution is right skewed because the mean is greater than the median. b. Use Minitab to create a graph that will show you the shape of the distribution and give a brief description of the distribution. (Hint: Do a histogram.)

6

7

Answer: The distribution is right skewed with an outlier to the right. 12. Construct boxplots of the Residual Sugar variable grouped by Producer. (Use: Boxplot -> One Y, With Groups.)

a. Which producer’s wines have the highest IQR of Residual Sugar? Answer: Italy b. Which producer produced the most wines that are outliers with regard to Residual Sugar? Answer: California 7

8

13. a. Standardize the Residual Sugar variable using Minitab. (Use the observed (that is, the variable’s) mean and std. dev.) Store the new variable (call it: Res Sug-Std) in an open column of your worksheet. Write here the first 8 standardized scores. Answer: -0.409726, 0.104267, -0.116016, -0.409726, -0.409726, -0.483154, -0.630009, -0.923719 b. Use Minitab to show that the mean of the standardized variable is 0 and the standard deviation is 1. (Why this is true of all standardized variables when standardization is done using the observed mean and standard deviation will be explained in the Chapter 1 solutions. To see why, consider Question 1.)

c. Use Minitab to create a histogram of the standardized variable and compare it to your histogram from Question 11b. What do you notice?

8

9

Answer: They are identical. 14. For our company’s service call center, call duration is distributed N(22, 6) (minutes). To study this distribution, use Minitab to simulate taking a random sample of 200 service calls from N(22,6). See below for instructions to do the simulation.** a. Do a histogram of your sample. Describe its shape.

9

10

Answer: The graph is very close to being symmetric, with the mean and median being very similar, but it is slightly left skewed. b. What are the z-scores of the maximum and minimum values in your sample? (Remember to use population (that is, distribution) parameters (22 and 6) to standardize.) Do these surprise you? Why or why not?

Answer: I am not surprised that the min and max are close in absolute value given the initial histogram, because of how symmetrical it appeared to be. I was surprised however that the results were not identical in absolute value given that it is a “true standardized normal distribution”. c. According to the empirical rule, how many of the values do you expect to be less than 10 or greater than 34? Briefly explain your answer. Answer: 10 values… because 10 and 34 are equal to mean + and – 2sigma. According to the empirical rules, the space between them is =95%, so the spaces below 10 and over 34 are =5%. 5% of 200 values is equal to 10. d. Use Minitab to standardize all of your random numbers, then do a stem-and-leaf diagram of the standardized scores. (Remember, when you know the distribution, you standardize using the mean and standard deviation of the distribution, not of the sample.) Using the stem-and-leaf, state how many of the numbers in your sample actually are less than 10 or greater than 34. 10

11

Answer: Since the mean=0 and sigma=1, mean + or – 2sigma = +2 or -2. Therefore, there are 9 values below 10 or greater than 34. e. Since the times are distributed N(22, 6), we say that we expect the average duration of a service call to be 22 minutes; that is, 22 is the expected value (also, the mean) of the distribution. What is the average of your sample? Briefly explain (perhaps in one or two words) why it is not exactly equal to 22. Answer: Its an approximation. The average of my sample=21.788. It is different than 22 because the expected value is an estimation, and achieving it exactly is generally rare. My histogram was not completely symmetric, so we could expect it to vary slightly from the expected value. f. What is the standard deviation of your sample? What did you expect the standard deviation of your sample to be? Briefly explain (in one word) why the standard deviation of your sample is not exactly equal to what you expected. Answer: My standard deviation = 5.776. I expected it to be 6, because that is what I entered into minitab, but I can see how it is different since the 6 was an estimation.

11

12

** To do the simulation use: Calc -> Random Data -> Normal -> Number of rows of data to generate: 200 -> set mean and standard deviation and storage column 15. The six numbers below are drawn from N(28, 3.5). Compute by hand their standard scores. (Just type in your answers; remember, by-hand calculations need not be shown.) a. 27.5 Answers: a. -0.14

b. 35.0 b. 2

c. 28.5 c. 0.14

d. 33

e. 23.5

f. 25.5

d. 1.43

e. -1.29

f. -0.71

b. What relationship do you see between the number, its z-score, and the mean of the distribution? Answer: If the number is smaller than the mean, its z-score is negative. If the number is larger than the mean, its z-score is positive. The farther away from the mean, the larger the absolute value of the z-score. 16. In a recent TV ad, it is claimed that 84% of chronic pain sufferers who used SuperGood4Pain reported relief from their pain. Assuming proper sampling of an adequate size and that this figure accurately reports the subjects’ claims, what key piece of missing information do we need to assess this claim? Answer: We need to know if there was a group that was given a placebo, and the results of said group if it exists. If the placebo group also reported a high percentage of relief, then it is possible that the claim is not accurate. 17. Recall that the numbers ±1.96 cut off the center 95% of N(0, 1). (See the graph below.)

12

13

Distribution Plot Normal, Mean=0, StDev=1 0.9500

0.4

Density

0.3

0.2

0.1

0.0

-1.96

0

1.96

X

Recall that we say, therefore, that 1.96 is the 95% confidence multiplier for the standard normal. a. Use Minitab to create a graph showing that 2.576 is the 99% confidence multiplier for the standard normal.

b. Use Minitab to find the 90% confidence multiplier for the standard normal.

13

14

c. Use Minitab to create a graph showing the 90% confidence multiplier for the t with 29 degrees of freedom. Briefly explain, using what we have discussed about the t distributions, why this number is larger than 1.645.

Answer: The number is larger due to heavy tails. d. Will the 95% confidence multiplier for the t with 29 degrees of freedom be larger or smaller than 1.96? (Hint: You may check this with Minitab.) Explain your answer using what we have discussed about the t distributions. 14

15

Answer: The confidence multiplier would be larger than 1.96, because T distributions have larger confidence multipliers than their respective normal distributions due to t distributions having heavy tails. 18. Notice that in the standard normal, the number 1.96 cuts off an upper tail of 0.025. (And, by symmetry, -1.96 cuts off a lower tail of 0.025.) Recall that we say that 1.96 is the 1-sided critical value for 0.025 in the standard normal (and the 2-sided critical value for 0.05 – the sum of the upper and lower tails). (See the graph below.) Distribution Plot Normal, Mean=0, StDev=1 0.4

Density

0.3

0.2

0.1

0.025 0.0

0

1.960

X

a. Use Minitab to create graphs showing the 1-sided critical values for 0.05 and 0.005 in the standard normal. Use: Graph – Probability Distribution Plot – View Probability – N(0, 1) – Shaded Area – Probability – Right Tail – (0.05 then 0.005).

15

16

b. Considering the graphs you created for part a, give the 2-sided critical values for 0.10 and for 0.01 in the standard normal. Briefly explain your answers. Answer: 0.10 = 1.645, 0.010 = 2.576. This is due to 0.1 and 0.01 being double 0.05 and 0.005 for the one-sided critical values. c. Use Minitab to create graphs showing the 1-sided critical values for 0.05 and 0.005 in the t with 29 degrees of freedom. Briefly explain, using what we have discussed about the t distributions, why these values are higher than the corresponding values from part a. (Same as a but change to t – Degrees of freedom: 29.)

16

17

Answer: They are larger due to heavy tails.

17

18

d. Consider the probabilities, 0.01 and 0.005. In the standard normal, will the critical value for 0.01 be greater than or less than the critical value for 0.005? Briefly explain your answer. Answer: The tail for 0.01 is larger than the tail for 0.005, making critical value for 0.005 less than the critical value of 0.01. e. Consider the probabilities, 0.01 and 0.005. In the t with 29 degrees of freedom, will the critical value for 0.01 be greater than or less than the critical value for 0.005? Briefly explain your answer. Answer: The tail for 0.01 is larger than the tail for 0.005, making critical value for 0.005 less than the critical value of 0.01. 19. Suppose that, for some hypothesis test (which will discuss beginning in Chapter 5), we compute a test statistic of 1.88. Consider the graph below. It shows that the number 1.88 (picked at random for demonstration purposes) cuts off an upper tail of 0.03005 in the standard normal. Distribution Plot Normal, Mean=0, StDev=1 0.4

Density

0.3

0.2

0.1

0.03005 0.0

0

X

18

1.88

19

We say that 1.88 gives a 1-sided p-value of 0.03005 in the standard normal. That is, this upper tail area (shaded in red) is called the p-value. (By symmetry, the 2-sided p-value would be 0.0601 – 2 times the 1-sided p-value.) a. Using Minitab, create a graph showing the 1-sided p-value for 1.564 in the standard normal. What is the 2-sided p-value for 1.564 in the standard normal?

Answer: 0.11782 b. Using Minitab, create a graph showing the 1-sided p-value for 1.564 in the t with 29 degrees of freedom. What is the 2-sided p-value for 1.564 in the t with 29 degrees of freedom? Briefly explain, using what we have discussed about the t distributions, why these p-values are larger than the corresponding ones found in part a.

19

20

Answer: 1.2866; it is larger due to heavy tails c. In the standard normal, will the number 1.310 give a higher or lower 1-sided p-value than that given by 1.564? Briefly explain your answer. Answer: The number 1.310 will give a higher 1-sided p value, because with the distance from the mean (center) being smaller, the tail probability is larger. d. In the t with 29 degrees of freedom, will the number 1.310 give a higher or lower 1-sided p-value than that given by 1.564? Briefly explain your answer. Answer: The number 1.310 will give a higher 1-sided p value, because with the distance from the mean (center) being smaller, the tail probability is larger. 20. Consider the Probability Distribution Plot below. a. Which of the numbers in it is the test statistic? Answer: 1.696 b. Which of the numbers in it is the p-value?

20

21

Answer: 0.04494

IMPORTANT NOTE on critical values vs. p-values: Note, a critical value is a number. It is obtained from a given probability and a distribution. It is the number that cuts the given probability in the tail of the given distribution. A p-value is a probability. It is obtained from a number (such as a z-score or t-score, which are usually called test statistics) and is the tail probability that is cut off by that number.

21...


Similar Free PDFs