Spring 21 Chapter 1 Exs - Homework PDF

Title Spring 21 Chapter 1 Exs - Homework
Course Business Statistics
Institution Purdue University
Pages 18
File Size 615.5 KB
File Type PDF
Total Downloads 29
Total Views 125

Summary

Homework...


Description

1

CHAPTER 1 EXERCISES (Note: For all exercises in all chapters, when asked to confirm results or create something with Minitab, the Minitab output must be copied and pasted into your paper. Remember, you should do any requested by-hand calculation to make sure that you know how to, but these do not have to be in your homework paper.)

1. For each of the variables below, compute by hand (that is, with a calculator) their mean and standard deviation. Confirm your results with Minitab. (Type the numbers into separate columns in the Minitab worksheet.) -

For variables a, b, and c, what relationships do you see between the variable means? Between the variable standard deviations? Variables a and b are the same but a is just multiplied by 10 to get b. Even the mean and standard deviation is multiplied by 10. Variables a and c are pretty much the same except c is a+1. This will throw off the mean and standard deviation but

-

Stat 225 Review for Extra credit (3 points): Note that Variable e = Variable a + Variable d. Note that the standard deviation of e ≠ sum of standard deviations of a and d? What is the correct relationship?

a. 10, 20, 30, 40, 50, 60, 70, 80, 90 Mean: 50 SD: 27.3861 b. 100, 200, 300, 400, 500, 600, 700, 800, 900 Mean: 500 SD: 273.8613 c. 11, 21, 31, 41, 51, 61, 71, 81, 91 Mean: 51 1

2

SD: 27.3861 d. 5, 8, 13, 21, 34, 55, 89, 144, 233 Mean: 66.8889 SD: 77.1029 e. 15, 28, 43, 61, 84, 115, 159, 224, 323 Mean: 116.8889 SD: 102.2735 2. a. For 2020-21, the minimum NBA player salary is $898,310. The mean salary is approximately $7.9 million, the median $3.8 million, and, at the time of this writing, the maximum salary is $43 million (Steph Curry). Do you believe the distribution of 2020-21 NBA player salaries is a normal distribution? Briefly explain your answer. This is not a normal distribution because the minimum NBA player salary is $898,910 and Steph Curry makes $43 million. The mean is $7.9 million and the median is $3.8 million. This shows that it is skewed very badly to the right and there are more people closer to the minimum than the maximum. b. Consider the distribution of all household incomes in the state of Indiana. Briefly explain why you believe this is not a normal distribution. Not normally distributed and wide range of numbers. Skewed right due to high incomes. 3. a. Open the “How Much is a Win Worth” document. Evaluate this graph (which is from the Spring 2018 Exponent). Give two criticisms of the graph. A criticism of this graph is that it is organized alphabetically. This has nothing to do with how much the Purdue coaches are making or how often they win. Another thing

2

3

is that the size of the circles don’t really seem to matter. They are just random sizes and it has no factor on the winning or the amount the coaches are making as well. b. Open the UK 2016 Q4 GDP Growth graph. What is a major flaw in its construction? (Hint: Is the Second Estimate of the GDP growth three times bigger than the First Estimate?) They do not have a full x -axis. They started counting at 55% instead of 0% so that there would be a bigger gap between the two estimates even though they are only 10% off of each other. 4. Suppose the Pacers basketball team is averaging 108 points per game half way through their season. a. Is this a good prediction of exactly how many points they will score in their next game? Why or why not? No because an average amount isn’t telling the whole story of all their games so far. They could have had a poor game one night or one of their top scorers weren’t in the game due to an injury and they couldn’t make that many for a few games and that brought the average down to a lower number. More stats are needed to make an estimate of the number of points they will make. b. To make such a prediction, what other information would you like to have? Other information they would need is how many minutes a player played in the games and the amount of points a player makes in a game. They also need to see what teams they played and see what the scores were for all those games to see if any outliers were in those games. Standard deviation and min and max

3

4

5. In an upcoming basketball game, New Orleans at Utah, sports gamblers can bet on whether Zion Williamson, a player for New Orleans, will score more or fewer than 26.5 points. An analyst points out that in his 4 career games against Utah last year, Zion is averaging 23.5 points per game. What would you rather know? I would rather know how many points he got in each game to see if there are any outliers and to see how well he played in all those games. One game, he could have been injured or didn’t play as much. 6. Tracy, an Indiana high school student, is considering applying to Purdue but notices the 16% admission rate for all applicants. What conditional probabilities should Tracy look at to get a better picture of the applicable acceptance rate? (Give two.) She should look at how many instate applicants get accepted because the out of state applicants brings that percentage down. Another thing she look at is average GPA and SAT scores to see where she falls compared to other students. 7. Read the two documents in the Washington State Climate Change Results file. Briefly explain how response bias could account for their inconsistency. Since this survey was about climate change, many people might have decided not to take the survey. Then the only ones that wanted to make a change would take the survey making it look more promising that something would-be set-in place. Open the Wine Quality dataset. Questions 8 - 13 concern this dataset. 8. Which variables are categorical and which are numerical? Write two questions you would try to answer about these wines using this dataset. Categorial – producer

4

5

Numerical – citric acid, residual sugar, chlorides, free sulfur dioxide, sulfur dioxide, density, pH, sulphates, alcohol content, quality rating How does residual sugar affect the quality rating? Does the citric acid affect the pH level? 9. Use Minitab to construct a dotplot of the Quality Rating variable in the dataset.

10. Wines are typically rated by connoisseurs on a scale of 1 – 10 (10 for best). These three wine producers all produce wines of all types, including many 9 and 10 rated wines of the highest quality. Given this, what does the dotplot in Question 9 tell you about the group of wines in this sample? (Hint: Is this a random sample of all of their wines?)

5

6

No this is not a random sample. These are obviously not the high-end wines or the low-end wines since the range of quality is 4-7. These are just middle ground wines. 11. Use Minitab to compute a 6-number summary of the Residual Sugar variable. Compare the mean and the median. a. What does this comparison suggest about the shape of the distribution of this variable? Variable Residual Sugar

Mean 2.458

Minimum Q1 1.200 1.800

Median Q3 2.000 2.400

Maximum 10.700

It is almost symmetrical distribution of data except for an outlier or two on the higher end.

6

7

b. Use Minitab to create a graph that will show you the shape of the distribution and give a brief description of the distribution. (Hint: Do a histogram.)

7

8

Skewed right just by a slight amount. There is an outlier at 12 as well. 12. Construct boxplots of the Residual Sugar variable grouped by Producer. (Use: Boxplot -> One Y, With Groups.) a. Which producer’s wines have the highest IQR of Residual Sugar?

Italy is the highest IQR of Residual Sugar. 8

9

b. Which producer produced the most wines that are outliers with regard to Residual Sugar? California produced the most wines that are outliers when it comes to Residual Sugar. 13. a. Standardize the Residual Sugar variable using Minitab. (Use the observed (that is, the variable’s) mean and std. dev.) Store the new variable (call it: Res. Sug.-Std) in an open column of your worksheet. Write here the first 4 standardized scores. -0.409726 0.104267 -0.116016 -0.409726 b. Use Minitab to show that the mean of the standardized variable is 0 and the standard deviation is 1. (Why this is true of all standardized variables when standardization is done using the observed mean and standard deviation will be explained in the Chapter 1 solutions. To see why, consider Question 1.) Variable Res. Sug.-Std

Mean -0.000

StDev 1.000

c. Use Minitab to create a histogram of the standardized variable and compare it to your histogram from Question 11b. What do you notice?

9

10

Almost identical but with different scale. 14. The six numbers below are drawn from N(20, 10). Compute by hand their standard scores. (Just type in your answers; remember, by-hand calculations need not be shown.) a. 29.5

b. 38.0

c. 22.5

d. 33

e. 19.75

f. 25.5

0.95 1.8 0.25 1.3 -0.025 0.55 b. What relationship do you see between the number, its z-score, and the mean of the distribution? As x gets bigger, the z-score gets bigger as well. If a number is greater than mean, zscore is positive and of the number is less than the mean, the z-score is negative. 15. Recall that the numbers ±1.96 cut off the center 95% of N(0, 1). (See the graph below.)

10

11

Distribution Plot Normal, Mean=0, StDev=1 0.9500

0.4

Density

0.3

0.2

0.1

0.0

-1.96

0

1.96

X

Recall that we say, therefore, that 1.96 is the 95% confidence multiplier for the standard normal. a. Use Minitab to create a graph showing that 1.645 is the 90% confidence multiplier for the standard normal.

b. Use Minitab to find the 80% confidence multiplier for the standard normal.

11

12

c. Use Minitab to create a graph showing the 80% confidence multiplier for the t with 60 degrees of freedom. Briefly explain, using what we have discussed about the t distributions, why this number is greater than the number found in part b.

T distribution has heavier tails so t distribution is greater than standard normal. d. Will the 90% confidence multiplier for the t with 60 degrees of freedom be greater than or less than 1.645? (Hint: You may check this with Minitab.) Explain your answer using what we have discussed about the t distributions. It will be greater than 1.645 because like I said above, the t distribution has a heavier tail. 16. Notice that in the standard normal, the number 1.96 cuts off an upper tail of 0.025. (And, by symmetry and centering, -1.96 cuts off a lower tail of 0.025.) Recall that we say that 1.96 is the 1-sided critical value for 0.025 in the standard normal (and the 2-sided critical value for 0.05 – the sum of the upper and lower tails). (See the graph below.)

12

13

Distribution Plot Normal, Mean=0, StDev=1 0.4

Density

0.3

0.2

0.1

0.025 0.0

0

1.960

X

a. Use Minitab to create graphs showing the 1-sided critical values for 0.10 and 0.05 in the standard normal. Use: Graph – Probability Distribution Plot – View Probability – N(0, 1) – Shaded Area – Probability – Right Tail – (0.10 then repeat for 0.05).

b. Considering the graphs you created for part a, give the 2-sided critical values for 0.20 and for 0.10 in the standard normal. Briefly explain your answers. Critical value for 0.2 = +/- 1.282 Critical value of 0.1 = +/- 1.645 With 2-sided critical values, you need to look at both sides instead of just the 1 side. c. Use Minitab to create graphs showing the 1-sided critical values for 0.10 and 0.05 in the t with 60 degrees of freedom. Briefly explain, using what we have discussed

13

14

about the t distributions, why these values are greater than the corresponding values from part a. (Same as part a but change to t – Degrees of freedom: 60.)

d. Consider the probabilities, 0.01 and 0.005. In the standard normal, will the 1-sided critical value for 0.01 be greater than or less than the 1-sided critical value for 0.005? Briefly explain your answer. The critical value of 0.01 will be less than 0.005 because confidence level decrease, critical value increases. e. Consider the probabilities, 0.01 and 0.005. In the t with 60 degrees of freedom, will the 1-sided critical value for 0.01 be greater than or less than the 1-sided critical value for 0.005? Briefly explain your answer. 0.01 is less than 0.005 because confidence level decreases and critical value increases 17. Suppose that, for some hypothesis test (which will discuss beginning in Chapter 5), we compute a test statistic of 1.88. Consider the graph below. It shows that the number 1.88 (picked at random for demonstration purposes) cuts off an upper tail of 0.03005 in the standard normal.

14

15

Distribution Plot Normal, Mean=0, StDev=1 0.4

Density

0.3

0.2

0.1

0.03005 0.0

0

1.88

X

We say that 1.88 gives a 1-sided p-value of 0.03005 in the standard normal. That is, this upper tail area (shaded in red) is called the p-value. (By symmetry and centering, the 2-sided p-value would be 0.0601 – 2 times the 1-sided p-value.) a. Using Minitab, create a graph showing the 1-sided p-value for 1.509 in the standard normal. What is the 2-sided p-value for 1.509 in the standard normal?

2-sided p-value is 0.06565 multiplied by 2 to get 0.1313 b. Using Minitab, create a graph showing the 1-sided p-value for 1.509 in the t with 60 degrees of freedom. What is the 2-sided p-value for 1.509 in the t with 60 degrees of freedom? Briefly explain, using what we have discussed about the t distributions, why these p-values are larger than the corresponding ones found in part a.

15

16

2-sided p-value of 1.509 is 0.6287 multiplied by 2 to equal 0.12574. The p-values are great than the ones found in part a because the t-distribution is using the degrees of freedom to get the bell shape of the distirubtion. The standard devition will always be slightly larger than 1. Tails are heavier making p-values of tdistribution greater than a normal distribution. c. In the standard normal, will the number 1.388 give a greater or lesser 1-sided pvalue than that given by 1.509? Briefly explain your answer. The number 1.388 gives a larger p-value than 1.509 because 1.388 lies to the left of 1.509 on the distribution plot. This means that 1.388 has a larger p-value with more shaded area under the graph. d. In the t with 60 degrees of freedom, will the number 1.388 give a greater or lesser 1-sided p-value than that given by 1.509? Briefly explain your answer. Just like c, 1.388 gives a larger 1-sided p-value than 1.509. The close the x-value is to the mean, the larger the p-value will be

18. Consider the Probability Distribution Plots below. The top plot is the standard normal, N(0, 1). The bottom plot is N(100, 15), the IQ distribution. a. In the top plot, which of the numbers is the test statistic? 1.696

16

17

b. In the top plot, which of the numbers is the p-value? 0.04494 c. In the bottom plot, the shaded red area (p-value) is identical to that of the top plot. Briefly explain why this is so? When finding z-score of second graph, (125.44-100)/15=1.696. This is the teststatisitic of the first graph

17

18

IMPORTANT NOTE on critical values vs. p-values: -

Note, a critical value is a number. It is obtained from a given probability and a distribution. It is the number that cuts the given probability in the tail (or tails, if 2-sided) of the given distribution.

-

A p-value is a probability. It is obtained from a number (such as a z-score or tscore, which are usually called test statistics) and is the tail probability (either 1sided or 2-sided) that is cut off by that number.

18...


Similar Free PDFs