Additional Practice Problems PDF

Title Additional Practice Problems
Course Business Data Analytics
Institution McMaster University
Pages 14
File Size 396.9 KB
File Type PDF
Total Downloads 453
Total Views 667

Summary

Download Additional Practice Problems PDF


Description

Additional Practice Problems (Chapters 1 – 10) Note: Please note that this document is only a set of “additional” practice problems from different sources and is not guaranteed to mimic the midterm exam in terms of structure, form, difficulty, number of questions, chapter split, coverage, etc. 100% accurately. To prepare for the midterm please review all the material provided in the course and the problems therein.

1- A measurement scale that rates product quality as either 1 = poor, 2 = average and 3 = good is known as: A) interval B) nominal C) ordinal D) ratio 2- A researcher used a procedure to select a sample of n objects from a population in such a way that each member of the population is chosen strictly by chance, is equally likely to be chosen, and every possible sample of size n has the same chance of selection. The procedure used by the researcher is known as: A) inferential statistics B) descriptive statistics C) simple random sampling D) None of the above 3- A recent survey asked respondents about their monthly purchases of raffle tickets. The monthly expenditures, in dollars, of ten people who play the raffle are 23, 15, 11, 20, 28, 35, 13, 10, 20, and 24. Which of the following statements is not true? A) The median is equal to the mode B) The mean is 19.9 C) The 75th percentile is equal to 23.5 D) The distribution is approximately symmetric. 4- What would you conclude if a sample correlation coefficient is equal to -1.00? A) All the data points must fall exactly on a straight line with a negative slope. B) Most of the data points must fall exactly on a straight line with a positive slope C) Most of the data points must fall exactly on a horizontal straight line D) All the data points must fall exactly on a straight line with a positive slope. 5- Which measures of central location are not affected by extremely small or extremely large data values? A) mode and arithmetic mean

B) arithmetic mean and median C) geometric mean and arithmetic mean D) median and mode 6- For a sample of size 5, if 𝑥1 − 𝑥 = −5, 𝑥2 − 𝑥 = 9, 𝑥3 − 𝑥 = −7, 𝑥4 − 𝑥 = −2, 𝑥5 − 𝑥 = 0 then the sample standard deviation is: A) 5.639 B) 6.066 C) 6.305 D) 6.782 Hint: 𝑠 = √

∑(𝑥𝑖 −𝑥 )2 𝑛−1

= 6.305

7- A z-score is a standardized value that indicates the number of standard deviations a value is from the: A) Median B) Mean C) Mode D) frequency 8- Which of the following statements is not true? A) When events A and B are independent, then 𝑃(𝐴 and 𝐵) = 𝑃 (𝐴) + 𝑃(𝐵). B) When events A and B are independent, then 𝑃(𝐴 and 𝐵) = 𝑃 (𝐴). 𝑃(𝐵). C) When events A and B are independent, then 𝑃(𝐴 or 𝐵) = 𝑃 (𝐴) + 𝑃 (𝐵) − 𝑃(𝐴). 𝑃(𝐵). D) Events are independent when the occurrence of one event has no effect on the probability that another will occur. 9- A regression analysis between sales (𝑦 in $1000) and advertising (𝑥 in $) resulted in the following least squares line: 𝑦 = 80,000 + 4𝑥. This implies that: A) an increase of $4 in advertising is expected to result in an increase of $4,000 in sales. B) an increase of $1 in advertising is expected to result in an increase of $80,000 in sales. C) an increase of $1 in advertising is expected to result in an increase of $4,000 in sales. D) an increase of $1 in advertising is expected to result in an increase of $4 in sales. 10- Suppose we have the following information from a simple regression: 𝑏1 = −2.39, 𝑆𝑦 = 5 and 𝑆𝑥 = 1.8. What is the value for the coefficient of determination (𝑅 2 )? A) 0.32 B) 0.45 C) 0.74 D) 1

Hint: 𝑏1 = 𝑟

𝑠𝑦

𝑠𝑥

, 𝑅2 = 𝑟2

11- What does a correlation coefficient 𝑟 = −0.8 indicate? A) It indicates that 64% of the variation in the dependent variable can be explained by the linear model. B) It indicates that 8% of the variation in the dependent variable can be explained by the linear model. C) It indicates that 16% of the variation in the dependent variable can be explained by the independent variable. D) It indicates that 80% of the variation in the dependent variable can be explained by the linear model. 12- A company believes that there is a 40% chance of making a daily profit of $700, a 35% chance that it will be $800 and 25% chance that it will be $1000. What is the expected profit of this company? A) 833.3 B) 810 C) 950 D) 1000 Hint: 𝐸(𝑋) = ∑ 𝑥𝑃(𝑥) 13- Travel time from home to work for a company employee follows a normal distribution with mean 40 minutes and standard deviation 8 minutes. What is the probability that on any day, her travel time from home to work is less than 35 minutes? (choose the closest answer) A) approximately 0.72 B) approximately 0.43 C) approximately 0.34 D) approximately 0.26 Hint: 𝑃(𝑌 ≤ 35),

𝑌 is Normal with mean and standard deviation 40 and 8, respectively. Convert to Z for answer.

14- The manager of a computer help desk operation in Moncton has collected enough data to conclude that the time per call is normally distributed with a mean equal to 10 minutes and a standard deviation of 2 minutes. The manager wants to set the time limit that only 20% of all calls will exceed this limit. The time limit should be A) about 11.68 minutes B) about 15.24 minutes C) about 5.22 minutes

D) about 3.12 minutes Hint: 𝑃(𝑌 ≥ 𝑘) = 0.2 , 𝑃(𝑍 ≥ 𝑧) = 0.2 Find z, convert to y for the answer.

15- A courier service claims that only 10% of all of its deliveries arrive late. Assuming deliveries are independent, a sample of 8 deliveries is randomly selected. What is the probability that exactly 3 of the sample deliveries arrive late? A) around 0.033 B) around 0.120 C) around 0.278 D) around 0.342 Hint: 𝑝 = 0.1 𝑃(𝑋 = 𝑘) = (𝑘𝑛)𝑝 𝑘 (1 − 𝑝)𝑛−𝑘 16- Suppose that we believe the mean income of people in a region is $40,000 with a standard deviation of $9,000 and it is assumed that the income is normally distributed. Suppose we take a sample of size 36 from this population. What is the probability that the average income of people in this sample is less than $37,600? A) around 0.35 B) around 0.25 C) around 0.15 D) around 0.05  ≤ 37600) , remember 𝑆𝐷(𝑌 ) = Hint: 𝜇 = 40,000 𝜎 = 9000 , 𝑃(𝑌 √𝑛 𝜎

17- Suppose according to a recent study, we know that about 75% of Torontonians would pay extra to buy Canadian. Suppose 50 shoppers in Toronto are selected randomly, what is the probability that more than 70% of the shoppers in the sample would be willing to pay extra to buy Canadian? A) around 0.8 B) around 0.6 C) around 0.4 D) around 0.2 Hint: 𝑝 = 0.75 𝑛 = 50, 𝑃(𝑃 ≥ 0.7) 𝑝(1−𝑝) Convert to Z to find the answer. Remember 𝑆𝐷(𝑃 ) = √ 𝑛

18- All of the following are assumptions required for the central limit theorem (CLT) except A) The independence assumption B) The sample size assumption

C) The linearity assumption D) The randomization assumption 19- The entrance exam for business schools, the GMAT, given to 100 students had a mean of 520 and a standard deviation of 120. What was the standard error for the mean of this sample of students? A) 120 B) 12 C) 1.2 D) 0.12 Hint: 𝑆𝐸(𝑌 ) =

𝑠

√𝑛

20- Consider the following frequency distributions generated by Excel. What is the missing cumulative % value identified by the asterisk? Bin Frequency Cumulative % 12.8 1 5% 41.6 5 30% 70.4 6 60% 99.2 6 * more 2 100% A) 85% B) 90% C) 100% D) 60% Solution: Having the frequency and the total number you can find the % for each bin. For the fourth bin the % is 6/20. With that we can find the cumulative %. 21- What type of graph does a stem-and-leaf resemble when turned vertically? A) pie chart B) line chart C) histogram D) scatter plot 22- Which of the following is a categorical variable? A) eye color B) daily sales in a store C) bank account balance D) tire pressure 23- Which of the following best describes the data: zip codes for students attending Glenville College? A) Quantitative B) time-series C) qualitative

D) numerical 24- At a large company, the majority of the employees earn from $22,000 to $32,000 per year. Middle management employees earn between $32,000 and $52,000 per year while top management earn between $54,000 and $104,000 per year. A histogram of all salaries would have which of the following shapes? A) Uniform B) skewed to left C) symmetrical D) skewed to right 25- Consider the following frequency distribution generated by Excel. What proportion of these values are less than 63? Bin Frequency Cumulative % 26 0 0.00% 44.5 5 25.00% 63 7 60.00% 81.5 1 65.00% More 7 100.00% A) B) C) D)

25% 65% 60% 35%

26- A baseball player is assigned the number 15 and another one the number 25. This is an example of ________ data. A) qualitative B) ratio levels C) quantitative D) interval 27- A professor needs to select a volunteer for a project. Which of the following would not be an example of a simple random sample? A) He chooses that individual whose name is first in alphabetical order. B) He has each student select a number between 0 and 99 and write it down. He then selects the student whose number is closest to the last two digits of his social security number. C) He puts all student names in a bowl, mixes them up, and selects one. D) He chooses a number between 00 and 99. The student whose phone number has the last two digits closest to the one the professor has chosen is selected. 28- What is the relationship among the mean, median, and mode in a positively skewed distribution? A) The mean is always the largest value. B) The mode is the largest value. C) They are all equal.

D) The mean is always the smallest value. 29- According to the Empirical Rule, the percentage of observations in a data set (providing that the data set has a bell-shaped and symmetric distribution) that should fall within two standard deviations of their mean is approximately: A) 90% B) 100% C) 97.5% D) 95% 30- Which of the following statement is true? A) The values of the standard deviation may be either positive or negative, while the value of the variance will always be positive. B) The standard deviation is expressed in terms of the original units of measurement but the variance is not. C) The range is found by taking the difference between the high and low values and dividing that value by 2. D) The interquartile range is found by taking the difference between the 1st and 3rd quartiles and dividing that value by 2. 31- Which of the following measure of dispersion is based on deviations from the mean? A) box plots B) coefficient of variation C) standard deviation D) range 32- Which term is used to describe the probability of the intersection of events A and B? A) conditional probability B) subjective probability C) joint probability D) marginal probability 33- A recent survey showed that 15% of computer programmers have experienced some form of wrist pain from typing, and that 25% are taking aspirin daily. Six percent of all programmers have both experienced some form of wrist pain from typing and taken aspirin on a daily basis. What is the probability that a programmer has wrist pain or takes aspirin on a daily basis or both? A) 0.15 B) 0.25 C) 0.66 D) 0.34 Hint: 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 𝑎𝑛𝑑 𝐵)

The next three questions are based on the following information: A supplier is evaluating a firm to manufacture a subassembly. Quality data from past inspections reveal the following probabilities for number of defective parts in a shipment: 1 2 >2 Number defective 0 0.8 0.1 0.06 0.04 Probability 34- What is the probability that there will be fewer than 2 defective parts in a shipment? A) 0.20 B) 0.90 C) 0.86 D) 0.10 35- What is the probability that the shipment will have defective parts? A) 0.20 B) 0.90 C) 0.86 D) 0.10 Hint: 1 – P(no defective) 36- What is the probability that the shipment will have at least two defective parts? A) 0.20 B) 0.90 C) 0.86 D) 0.10 37- In a recent survey of consumer confidence, 160 respondents were classified by their level of education. The results of the survey are presented below.

Are the events "had completed college education" and "had high confidence" statistically independent? A) Yes B) No C) Maybe D) There is not sufficient information to determine 38- The probabilities that may be computed by summing the corresponding row or column of a two-way table are called: A) individual probabilities B) marginal probabilities. C) bivariate probabilities D) conditional probabilities.

The next two questions are based on the following dataset showing the relationship between two variables.

39- Determine the least squares regression line. A) 𝑦 = 0.46 + 3.57𝑥 B) 𝑦 = 3.57 − 0.46𝑥 C) 𝑦 = 0.46 − 3.57𝑥 D) 𝑦 = 3.57 + 0.46𝑥 Hint: we need to calculate 𝑥 =

∑(𝑥−𝑥 )(𝑦−𝑦) (𝑛−1)𝑠𝑥 𝑠𝑦

∑ 𝑥𝑖 𝑛

, 𝑦 =

𝑛

, 𝑠𝑥 = √

∑(𝑥𝑖 −𝑥)  2 𝑛−1

, 𝑠𝑦 = √

∑(𝑦𝑖 −𝑦)2 𝑛−1

and 𝑟 =

.

Once we have those we can use 𝑏1 = 𝑟 intercept.

∑ 𝑦𝑖

𝑠𝑦

𝑠𝑥

for the slope and 𝑏0 = 𝑦 − 𝑏1 𝑥 for the y-

40- Compute the value of the R Squared (not as a percentage). A) 0.63 B) 0.80 C) 2.67 D) 3.47 41- The residual is defined as the difference between the actual value of: A) y and the estimated value of y B) x and the estimated value of x C) x and the estimated value of y D) y and the estimated value of x 42- A regression analysis between weight (y in pounds) and height (x in inches) resulted in the following least squares line: 𝑦 = 120 + 5𝑥. This implies that if the height is increased by 1 inch, the weight is expected to: A) increase by 1 pound B) decrease by 120 pounds C) decrease by 1 pound

D) increase by 5 pounds

43- A sample of 8 households was asked about their monthly income (x) and the number of hours they spend connected to the internet each month (y). The data yield the following statistics: ∑ 𝑥𝑖 = 324, ∑ 𝑦𝑖 = 393, ∑(𝑥𝑖 − 𝑥 )2 = 1720 .875, ∑(𝑦𝑖 − 𝑦)2 = 1150, and ∑(𝑥𝑖 − 𝑥 )(𝑦𝑖 − 𝑦) = 1090 .5. What is the y-intercept of the regression line of hours on income? A) 64.23 B) 36.43 C) 23.46 D) 46.23 Hint: we need to first calculate 𝑥 = then we can calculate 𝑟 =

∑ 𝑥𝑖

𝑛 ∑(𝑥−𝑥 )(𝑦−𝑦) (𝑛−1)𝑠𝑥 𝑠𝑦

, 𝑦 =

∑ 𝑦𝑖 𝑛

, 𝑠𝑥 = √

∑(𝑥𝑖 −𝑥 )2 𝑛−1

, 𝑠𝑦 = √

∑(𝑦𝑖 −𝑦)2 𝑛−1

and

44- A regression analysis between sales (in $1000) and advertising (in $100) resulted in the following least squares line: 𝑦 = 75 + 5𝑥. This implies that if advertising is $800, then the predicted amount of sales (in dollars) is: A) $179,000 B) $164,000 C) $115,000 D) $4075 45- What does a coefficient of correlation of 0.80 indicate? A) Sixty four percent of the variation in the dependent variable is explained by the independent variable B) Sixteen percent of the relationship between the dependent and independent variables are linear. C) Eight percent of the variation in the independent variable is explained by the dependent variable. D) Eighty percent of the variation in the independent variable is explained by the dependent variable. 46- Which of the following is true about a probability distribution? A) The probability of each outcome must be between 0 and 1, inclusive. B) The representation must be graphed, not tabular or algebraic. C) The outcomes do not need to be mutually exclusive. D) The sum of all possible outcomes must not equal 1. 47- The probability that a person catches a cold during the cold and flu season is 0.4. Assume that 10 people are chosen at random. What is the probability that exactly four of them will catch a cold? A) 0.2508 B) 0.3670 C) 0.6330

D) 0.7502 Hint: 𝑝 = 0.4, 𝑃(𝑋 = 𝑘) = (𝑘𝑛)𝑝 𝑘 (1 − 𝑝)𝑛−𝑘 48- The sum of the product of each value of a discrete random variable X and its probability is referred to as its: A) variance B) standard deviation C) probability distribution D) expected value 49- Let the random variable Z follow a standard normal distribution. Find 𝑃(0 < 𝑍 < 0.57). A) 0.2157 B) 0.7843 C) 0.7157 D) 0.2843 Hint: 𝑃(𝑎 < 𝑍 < 𝑏) = 𝑃(𝑍 < 𝑏) − 𝑃(𝑍 < 𝑎) 50- Let the random variable Z follow a standard normal distribution. Find the value 𝑘, such that 𝑃(𝑍 > 𝑘) = 0.73 . A) -0.16 B) 0.73 C) -0.61 D) 0.27 51- Why is the central limit theorem important in statistics and data analysis? A) Because for a large sample size 𝑛, it says the population is approximately normal. B) Because for any population, it says the sampling distribution of the sample mean is approximately normal, regardless of the shape of the population. C) Because for a large sample size 𝑛, it says the sampling distribution of the sample mean is approximately normal, regardless of the shape of the population. D) Because for any sample size 𝑛, it says the sampling distribution of the sample mean is approximately normal. 52- If the standard error of the sampling distribution of the sample proportion is 0.0229 for samples of size 400, then the population proportion must be either: A) 0.2 or 0.8 B) 0.5 or 0.5 C) 0.3 or 0.7 D) 0.4 or 0.6 Hint: 𝑆𝐸(𝑝 ) = √

𝑝(1−𝑝) 𝑛

53- Suppose that 20% of all invoices are for amounts greater than $800. A random sample of 50 invoices is taken. What is the mean and standard error of the sample proportion of invoices with amounts in excess of $800? A) mean = 10, standard error = 0.0598 B) mean = 0.20, standard error = 0.0566 C) mean = 0.20, standard error = 0.0032 D) mean = 10, standard error = 0.4472 Hint: 𝑝 = 0.2, 𝑛 = 50 , 𝑆𝐸(𝑝 ) = √

𝑝(1−𝑝) 𝑛

The next two questions are based on the following information: A stock analyst has provided estimates of a corporation's expected return over the next year. This return is likely to depend on the interest rate, so the analyst has developed the following two-way table containing probabilities.

Return < 8%

Interest Rate

IR < 3% 3% ≤ IR < 5% IR ≥ 5%

0.09 0.14 0.16

Return 8% ≤ Return < 12% 0.15 0.17 0.07

Return ≥ 12% 0.16 0.05 0.01

54- What is the probability that the stock has a return of at least 8%? A) 0.53 B) 0.48 C) 0.21 D) 0.61 E) 0.90 Hint: 𝑃(Return≥8%)=0.15 + 0.17 + 0.07 + 0. 16 + 0.05 + 0.01 55- If the interest rate remains below 5%, what is the probability that the corporation's return is at least 8%? A) 0.511 B) 0.578 C) 0.697 D) 0.142 Hint: 𝑃(Return≥8%| Interest Rate < 5%)=

𝑃(Return≥8% 𝐴𝑁𝐷 Interest Rate...


Similar Free PDFs