TEST 1 statistiek oefenvragen PDF

Title	TEST 1 statistiek oefenvragen
Course	Statistiek voor bedrijfswetenschappen
Institution	Katholieke Universiteit Leuven
Pages	16
File Size	818.3 KB
File Type	PDF
Total Downloads	83
Total Views	140

Preview

CLICK TO PREVIEW PDF

Summary

oefenvragen voor het vak statistiek
heel handig...

Description

TEST 1 1. A company manufacturing computer chips finds that 8% of all chips manufactured are defective. Management is concerned that high employee turnover is partially responsible for the high defect rate. In an effort to decrease the percentage of defective chips, management decides to provide additional training to those employees hired within the last year. After training was implemented, a sample of 450 chips revealed only 27 defects. Was is the P-value for this proportion test? o o o o o

0.0297 0.0594 0.1188 0.1638 0.5940

2. A company that sells eco-friendly cleaning products is concerned that only 19.5% of people who use such products select their brand. A marketing director suggests that the company invest in new advertising and labeling to strengthen its green image. The company decides to do so in a test market so that the effectiveness of the marketing campaign may be evaluated. Based on data collected in the test market, the company constructed a 98% confidence interval for the proportion of all consumers who might buy their brand. The resulting interval is 16% to 28%. What conclusion should the company reach about the new marketing campaign? o None of these o The new marketing campaign is effective in increasing the percentage of customers buying their brand. o The data do provide convincing evidence that the marketing campaign increases the percentage of customers for the company’s products. o The company should launch the new marketing campaign. o The data do not provide convincing evidence that the marketing campaign increases the percentage of customers for the company’s products.

3. Which of the following R code statements calculates the critical value that is used to construct a 90% confidence interval?

o

qnorm(0.05)

o

pnorm(0.1, lower.tail = FALSE)

uist! o

qnorm(0.05, lower.tail = FALSE)

o

pnorm(0.1)

4. The local newspaper of a small city wants to investigate the monthly wage of the inhabitants in the city. Therefore, the newspaper surveyed a random sample of all the inhabitants in the city. The answers of all respondents of the survey are stored in an R dataframe called “survey”. The following R console output is given:

The local newspaper now wants to test whether the proportion of males in the survey significantly differs from 50% with an alpha of 0.05. Which of the following R code statements correctly verifies this hypothesis? o

prop.test(sum(Gender=="F"), length(Gender), p = 0.5, alternative = "two.sided", conf.level = 0.975, correct = FALSE)

Juist! o

prop.test(sum(Gender=="M"), length(Gender), p = 0.5, alternative = "two.sided", conf.level = 0.95, correct = FALSE)

o

prop.test(sum(Gender=="M"), length(Gender), p = 0.5, alternative = "less", conf.level = 0.95, correct = FALSE)

o

prop.test(sum(Gender=="F"), length(Gender), p = 0.5, alternative = "less", conf.level = 0.975, correct = FALSE)

5. A recent poll of 120 adults who frequent the local farmer’s market found that 54 have purchased reusable cloth bags for their groceries. The 95% confidence interval for the proportion of adults who have purchased reusable cloth bags is o o o o o

0.4383 to 0.4617 0.205 to 0.525 0.361 to 0.539 0.3856 to 0.4896 0.4046 to 0.4954

6. The Quality Control Department wants to estimate the true percentage of rework for electrical components to within ±4%, with 99% confidence. Based on similar past studies, the percentage of rework was found to be 12%. How many components should they sample? Juist! o o o o o

438 1000 579 344 651

7. One division of a large defense contractor manufactures telecommunication equipment for the military. This division reports that 12% of non-electrical components are reworked. Management wants to determine if this percentage is the same as the percentage rework for the company’s electrical components. The Quality Control Department plans to check a random sample of the over 10,000 electrical components manufactured across all divisions. The 95% confidence interval based on this data is .0758 to .1339. Should management conclude that the percentage of rework for electrical components is lower than the rate of 12% for non-electrical components? o o o o o

Yes, because 12% is contained with the 95% confidence interval. None of these No, because the upper limit of the confidence interval is 13.4%. No, because 12% is contained with the 95% confidence interval. Yes, because the lower limit of the confidence interval is 7.6%.

8. A report on the U.S. economy indicates that 28% of Americans have experienced difficulty in making mortgage payments. A news organization randomly sampled 400 Americans from 10 cities named

the “fastest dying cities in the U.S.” (Forbes Magazine, August 2008) and found that 136 reported such difficulty. Does this indicate that the problem is more severe among these cities? The correct null and alternative hypotheses for testing this claim are o o o o o

H: H: H: H: H: 0 0 0 0 0

p p p p p

> = = = ≠

0.28 0.28 0.28 0.28 0.28

and and and and and

H: H: H: H: H: A A A A A

p p p p p

= < > ≠ =

0.28 0.28 0.28 0.28 0.28

9. A national study report released by the Center for Studying Health System Change (HSC) in 2010, indicated that 20.9% of Americans were identified as having medical bill financial issues. Many people in families with problems paying medical bills in 2010 experienced severe financial consequences from their medical debt, with about two-thirds reporting problems paying for other necessities and a quarter considering bankruptcy, the study found. What if a news organization randomly sampled 400 Americans from 10 cities and found that 90 reported having such difficulty. A test was done to investigate whether the problem is more severe among these cities. The result of this test is o There is insufficient evidence to conduct the test. o There is evidence that the medical bill problem is more severe among the cities surveyed. o Inconclusive. o There is no indication the medical bill problem is more severe among the cities surveyed. o None of these

10. Top management of a large multinational corporation wants to create a culture of innovativeness and change. A consultant hired to assess the company’s organizational culture finds that only 15% of employees are open to new ideas and approaches toward their work. Consequently the company conducts a program for employees in order to reinforce the new corporate philosophy. Based on data collected after the program, the consultant finds the 95% confidence interval for the proportion of all employees open to new ideas to be 18% to 22%. What should the company conclude? I. The null hypothesis should not be rejected. II. There is no evidence to suggest that the program improved employees' attitudes toward innovativeness and change. III. There is evidence that the program improved employees' attitudes toward innovativeness and change. o Both I and II

o o o o

Both I and III I only II only III only

TEST 2 1. An online book store wants to determine if there is an association between coupon redemption and gender. After a special coupon broadcast to its reward members, the following data on coupon redemption at check out were collected. The P-value associated with the calculated Chi-square statistic is 0.0209. At α = 0.05 the correct conclusion is

I. to reject the null hypothesis. II. to accept the null hypothesis. III. to conclude that there is an association between coupon redemption and gender. Juist! o o o o o

Both I and III I only Both II and III II only III only

2. The transport authority “Transport for London” operates a public bike sharing-scheme that consists of 750 docking stations and 11,500 bikes. The bikes are available 24/7, 365 days a year and allow citizens and visitors to get around in London quickly and easily. Transport for London has collected a dataset that contains the number of bikes that were rented for a random selection of 300 days from the past 5 years. Besides the number of bikes rented ( bikes_rented), the dataset also contains information on the weather (temperature, wind_speed, humidity, weather_code) and type of day (is_holiday, is_weekend, season). This dataset is already loaded into R as a dataframe called “bike_sharing”. The following R console output is given:

Which of the following R code statements correctly calculates a 99% confidence interval for the mean number of bikes rented on a given day? Juist! o

t.test(bikes_rented, conf.level = 0.99)

o

t.test(bikes_rented, conf.level = 0.005)

o

prop.test(bikes_rented, conf.level = 0.005)

o

prop.test(bikes_rented, conf.level = 0.99)

3. A mid-sized company has decided to implement an enterprise resource planning (ERP) system, and management suspects that many of its employees are concerned about the planned implementation. Managers are considering holding informational workshops to help decrease anxiety levels among employees. They randomly select 16 employees to participate in a pilot workshop. These employees were given a questionnaire to measure anxiety levels about ERP before and after participating in the workshop. If we let d = post-workshop anxiety level - pre-workshop anxiety level, the correct alternative hypothesis to determine if this approach was successful is o μ ≤0 o μ =0 d d

o μ ≠0 o μ 0 d d d

4. The transport authority “Transport for London” operates a public bike sharing-scheme that consists of 750 docking stations and 11,500 bikes. The bikes are available 24/7, 365 days a year and allow citizens and visitors to get around in London quickly and easily. Transport for London has collected a dataset that contains the number of bikes that were rented for a random selection of 300 days from the past 5 years. Besides the number of bikes rented ( bikes_rented), the dataset also contains information on the weather (temperature, wind_speed, humidity, weather_code) and type of day (is_holiday, is_weekend, season). This dataset is already loaded into R as a dataframe called “bike_sharing”. The following R console output is given:

Transport for London now wants to test with 95% confidence whether the average number of bikes that are rented on a windy day (this means a wind speed of 20 km/h or more) is significantly higher compared to the other days. Which of the following R code statements can be used to perform this test correctly? o

t.test(wind_speed[bikes_rented >= 20], wind_speed[bikes_rented < 20], alternative = "greater", conf.level = .95)

Juist! o

t.test(bikes_rented[wind_speed >= 20], bikes_rented[wind_speed < 20], alternative = "greater", conf.level = .95)

o

t.test(bikes_rented[wind_speed >= 20], bikes_rented[wind_speed < 20], paired = TRUE, conf.level = .95)

o

t.test(wind_speed[bikes_rented >= 20], wind_speed[bikes_rented < 20], paired = TRUE, conf.level = .95)

5. Grandma Gertrude's Chocolates, a family owned business, has an opportunity to supply its product for distribution through a large coffee house chain. However, the coffee house chain has certain specifications regarding cacao content as it wishes to advertise the health benefits (antioxidants) of the chocolate products it sells. In order to determine the mean % cacao in its dark chocolate products, quality inspectors sample 36 pieces. They find a sample mean of 55% with a standard deviation of 4%. The correct value of t* to construct a 90% confidence interval for the true mean % cacao is Juist! o o o o o

1.690 1.318 2.030 2.797 1.711

6. Top management of a large multinational corporation wants to create a culture of innovativeness and change. A consultant hired to assess the company’s organizational culture finds that only 15% of employees are open to new ideas and approaches toward their work. Consequently the company conducts a program for employees in order to reinforce the new corporate philosophy. After the program is completed, employees are surveyed to see if a greater percentage is now open to innovativeness and change. The correct alternative hypothesis is Juist! o o o o o

p p p μ μ

> 0.15 < 0.15 = 0.15 ≠ 0.15 > 0.15

7. The transport authority “Transport for London” operates a public bike sharing-scheme that consists of 750 docking stations and 11,500 bikes. The bikes are available 24/7, 365 days a year and allow citizens and visitors to get around in London quickly and easily. Transport for London has collected a dataset that contains the number of bikes that were rented for a random selection of 300 days from the past 5 years. Besides the number of bikes rented ( bikes_rented), the dataset also contains information on the weather (temperature, wind_speed, humidity, weather_code) and type of day

(is_holiday, is_weekend, season). This dataset is already loaded into R as a dataframe called “bike_sharing”. Transport for London now wants to verify whether or not the distribution of the number of bikes being rented on a given day is the same for each season. The following R code was used to perform this test:

Which R code statement should be used to retrieve the standardized residuals calculated by the test: o

test_results$observed

o

test_results$expected

o

test_results$components

Juist! o

test_results$residuals

8. Data were collected on annual personal time (in hours) taken by a random sample of 16 women and 7 men employed by a medium sized company. The women took an average of 24.75 hours of personal time per year with a standard deviation of 2.84 hours. The men took an average of 21.89 hours of personal time per year with a standard deviation of 3.29 hours. The standard error of the sampling distribution for the difference between the two means is o 0.48 o 20.5

o 5.02 o 2.24 o 1.43

9. A manufacturing plant for recreational vehicles receives shipments from three different parts vendors. There has been a defect issue with some of the electrical wiring in the recreational vehicles manufactured at the plant. The plant manager believes that the defect issue is dependent on the parts vendor. The plant manager reviews a sample of quality assurance inspections from the last six months. The correct value of the test statistic for determining if the plant manager's belief is supported is

o o o o o

χ χ χ χ χ

2 2 2 2 2

= = = = =

5.03 7.41 9.89 6.52 8.10

10. The transport authority “Transport for London” operates a public bike sharing-scheme that consists of 750 docking stations and 11,500 bikes. The bikes are available 24/7, 365 days a year and allow citizens and visitors to get around in London quickly and easily. Transport for London has collected a dataset that contains the number of bikes that were rented for a random selection of 300 days from the past 5 years. Besides the number of bikes rented ( bikes_rented), the dataset also contains information on the weather (temperature, wind_speed, humidity, weather_code) and type of day (is_holiday, is_weekend, season). This dataset is already loaded into R as a dataframe called “bike_sharing”. The following R console output is given:

Transport for London now wants to test whether the average number of bikes that are rented on a weekend day is significantly larger than 10,000. They allow the probability of a type I error to be at most 5%. Which of the following R code statements can be used to perform this test correctly? o

t.test(bikes_rented[is_weekend == 1], mu = 10000, alternative = "greater", conf.level = 0.05)

o

t.test(bikes_rented[is_weekend == 1], mu = 10000, alternative = "less", conf.level = 0.05)

Juist! o

t.test(bikes_rented[is_weekend == 1], mu = 10000, alternative = "greater", conf.level = 0.95)

o

t.test(bikes_rented[is_weekend == 1], mu = 10000, alternative = "less", conf.level = 0.95)

TEST 3 1. For the scatterplot shown below, the likely correlation coefficient is

Juist! o o o o o

+0.35 -0.89 +0.77 +0.90 -1.00

2. When using a plot of residuals (y-axis) vs. fitted value of the dependent variable, a plot with no pattern indicates that the:

o nearly normal condition is satisfied o linearity condition is not satisfied o independence condition is not satisfied o nearly normal condition is not satisfied o equal spread condition is satisfied

3. A sample of 15 recently trained line workers was selected to determine if there is a relationship between the number of hours of training time received by production line workers and the time it took (in minutes) for them to trouble shoot their last process problem were captured. Use the regression output for the independent variable shown below to find the 95% confidence interval for the slope of the regression equation. Predictor Coef SE Coef T P Training -1.8360 0.1376 -13.35 0.000

o o o o o

-3.611 to -0.069 -4 to 0.32 -2.1332 to -1.5388 -1.9776 to -1.7224 Can’t be determined with the information given.

4. A high leverage point o o o o o

Should probably be omitted Can hide in plots of residuals Can pull the regression line, making the slope appear smaller All of these Can be informative about the relationship between x and y

5. Transformation (re-expression) is NOT done to make the o o o o o

form of a scatterplot more nearly linear. spread of several groups more alike. Relationship between x and y look better. Distribution of a variable more symmetric. scatter in a scatterplot or residual plot spread out evenly rather than following a fan shape.

6. Suppose that 6 federal government economists and 7 university economists were asked to grade the effectiveness of an economic stimulus bill in terms of its ability to increase jobs over the next two years. The grades are shown in the table to the right. Using the appropriate nonparametric method, the calculated value of the test

statistic is

o o o o o

None of these T = 62.5. T = 28.5. T = 54. T = 30. 1 1 1 1

7. Suppose that ten new smart phone models were evaluated by two consumer electronics magazines (Popular Electronics and Electronics Now) from 1 (best) to 10 (worst) as shown below. Why is the Spearman rho more appropriate than the Pearson correlation for these data?

I. The relationship is nonlinear. II. The data are ordinal. III. The data are not paired. o o o o o

III only II only I, II, and III I only II and III only

8. The World Happiness Report is a landmark survey of the state of global happiness. The report calculates a happiness score for each country in the world based on answers to main life evaluation question from nationally representative samples. A researcher is using these happiness scores to determine which variables are able to explain the state of happiness of a country. Therefore, she has built the following linear model in R:

Given the linear model that this researcher has fit, which of the following statements is false? Juist! o There is a negative correlation between the state of happiness of a country and its urban population. o Data on 139 countries were used to fit this linear model. o We can reject the null hypothesis that there is no linear association between the state of happiness of a country and its urban population on a significance of 5%. o About 47.62% of the variation in h...