Assignment 2 (A2) PDF

Title Assignment 2 (A2)
Course Introduction to Statistics
Institution University of Auckland
Pages 6
File Size 296.2 KB
File Type PDF
Total Downloads 93
Total Views 184

Summary

A+ Exemplar...


Description

Kristina Baird  I.D. 721870040 1

Question 1: a) This study is an experiment as the researcher is controlling which oranges receive which juicing method. In this case, this means the three control groups are hand, microwave, and juicer. This test is completely random as the oranges are randomly picked from the supermarket and randomly placed into three different control groups. b) i)

The median volume orange juice obtained by hand juicing is approximately 65, whereas the median volume of juice obtained via the microwave was approximately 67, and finally, the median volume of juice obtained through the use of a mechanical juicer is approximately 78. In the below graph we see that the difference in the volume of juice obtained is 4.5.

ii)

Out of 1000 re-randomisations we obtained only 23 results with an average deviation of the median as being bigger or at least as big as what is observed. Therefore it would be unusual to get an observed average deviation from the median if chance had been acting alone.

iii)

It is not plausible that the observed differences between the three control group medians can be explained only by “chance acting alone” because during the re-randomisation distribution, the tail proportion is 23/1000 = 0.023 = 2.3% . Because this percentage is below 5% we know that chance wasn't acting alone.

Kristina Baird  I.D. 721870040

2

c) We can conclude that the type of juicing method affected the volume of juice obtained. We can make this conclusion because as stated above, we know that chance wasn’t acting alone as the tail proportion is too small (below 5%). A second reason to support this conclusion is that by re-randomising our data, we will still see an observable difference between the volume of juice obtained through the 3 different types of juicing methods. Re-randomised data:

Question 2: a) i)

A customer who has received excellent service while at the restaurant would be more inclined to fill in and return the questionnaire. Regular customers as well as elderly people would also be more likely to fill in the questionnaire.

ii)

The results from this questionnaire would not be an accurate representation of customers who are happy with their service. We can come to this conclusion as a result of nonresponse bias: when those you want to survey do not respond. Therefore, the results

Kristina Baird  I.D. 721870040

3

from the restaurant will not be an accurate representation of the whole population of customers. iii)

While the results of the questionnaire cannot provide much useful information on the restaurant's customers, it will provide information beneficial to improving service for a specific type of customer (i.e. happy customers, regulars, elderly people etc.).

b) One type of a non-sampling error that would most likely be present in the results of this survey is behavioural consideration. This is defined as the chosen subjects will tend to answer questions in a way they believe is socially acceptable. In relation to the chosen survey, people being surveyed will tend to give the answer that the surveyor wants instead of answering truthfully. Non-response bias is another type of a non-sampling error where the subjects chosen will not respond. This would have to be taken into consideration in this survey as not everyone that is approached will choose to respond. This could be for many reasons, for example, not everyone likes being approached by strangers, customers are in a rush etc. A third type of a sampling error that would be present in the results of this survey is selection bias. Selection bias is when proper randomisation is not achieved, meaning that the results obtained will not be an accurate representation of the population. This can be seen in the sporting goods survey as they would only be surveying those who enter the store from 10am to 3pm on Tuesday for a period of 4 weeks, this means that not the whole population is surveyed. A fourth type of non-sampling error that could be a potential problem with the survey in interviewer error. This is when those conducting the survey influences the response through how they interact with the chosen subjects. This could be present in the survey as the rugby players surveying the customers would directly/indirectly impact how they respond.

Question 3: a) i)

The median in a data set is the middle number in a sorted list, whereas the mean is the average of a data set. In relation to the waiting times of customers receiving coffee the median would be a better estimate of the centre of data because unlike the mean it is unaffected by outliers and skewed distribution. Wait times for coffee can be extremely uneven as cafe’s have both “busy hours” and “slow hours” as well as typically busy days. Because of this, if a mean were used to calculate an average, it would be affected by outliers.

Kristina Baird  I.D. 721870040

4

ii)

iii)

The parameter we are estimating using the bootstrap confidence interval is the difference in median wait times for customers ordering and receiving coffee. The parameter is a total of 82 seconds.

iv)

The true value of the parameter is 82 seconds.

v)

The bootstrap confidence interval for the median wait time of customers ordering and receiving their coffee is: (72,91).

Kristina Baird  I.D. 721870040

5

b) i)

ii)

The parameter we are estimating using the bootstrap confidence interval is the difference in median wait times for male and female customers ordering coffee. The parameter is a total of 22 seconds.

iii)

The bootstrap confidence interval for the median wait time of both male and female customers: (4,38).

iv)

From the above graph we see that the median wait time for females ordering coffee is slightly longer than the wait time of males ordering coffee (22 seconds) however, both of these medians lie within the bootstrap confidence interval, therefore it is plausible that the median wait time for males is the same as the median wait time for females.

Question 4: a) 1. μ = the average (mean) time for pizza to be delivered over a time period of a few weeks. 2. X = the mean time for delivery for a sample of 10 delivery times. 24.11 minutes 3. Formula: E stimate ± t × se(estimate) 4. se(x) = 1.8183 (t procedures tool) 5. For a 95% confidence interval with df = n − 1 , use t = 2.2622 (t procedures tool) 6. An approximate 95% confidence interval for μ is: x ± t × se(x) = 24.11 ± 2.2622 × 1.8183 = 24.11 ± 4.1134 (20.00, 28.22)

Kristina Baird  I.D. 721870040

6

For all pizza delivery times over a time period of a couple of weeks, we estimate, with 95% confidence, the mean delivery time is somewhere between 20 and 28 minutes. b) The confidence interval represents 95% of the delivery time for pizza. However there is a 5% deviation, meaning that pizza delivery times can be either shorter or longer than that represented in our confidence interval. So while it is a fairly safe bet that pizza will be delivered within 20 minutes, we cannot be confident. Question 5: a) i) ii) iii)

C B C

b) 1. P C − P D = The difference between the proportion of people under 50 whose first preference of programme type was comedy and the proportion of people under 50 whose first preference of programme type was documentaries. ︿ ︿ 2. P C − P D = The difference between the proportion of people in the sampe under 50 whose first preference of programme type was comedy and the proportion of people under 50 whose first preference of programme type was documentaries. 161 = 291 994 − 994 = 0.2928 − 0.1620 = 0.1308 3. The formula estimate ± t × se(estimate) gives ︿ ︿ ︿ ︿ P C − P D ± se(P C − P D ) ︿ 4. P C = 291 994 = 0 .2928 ︿ P D = 161 .1620 994 = 0 ︿

︿

P C − P D = 0.1308 Sampling situation: B se(pC  - pD )  = 0.021 5. For a 95% confidence interval with df = ∞ , use t = 1.96 6. An approximate 95% confidence interval for p C - p D is:  ︿ ︿ ︿ ︿ P C − P D ± se(P C − P D ) 0.2928 − 0.1620 ± 1.96 × 0.021 = 0.1308 ± 0.04116 (0.0896, 0.1720) 7. With 95% confidence, we estimate that the proportion of people under 50 whose first preference of programme type was comedy is somewhere between 9 and 17 percentage points greater than the proportion of people under 50 whose first preference of programme type was drama....


Similar Free PDFs