2020-S2-A1 s - STAT1070 Assignment 1 Solutions PDF

Title 2020-S2-A1 s - STAT1070 Assignment 1 Solutions
Author Abrar N
Course Statistics for the Sciences
Institution University of Newcastle (Australia)
Pages 11
File Size 470.9 KB
File Type PDF
Total Downloads 80
Total Views 148

Summary

STAT1070 Assignment 1 Solutions...


Description

STAT1070 Assignment 1 Solutions Semester 2, 2020 Due: Electronically via Blackboard by 11:59pm, Sunday 13 September. Please justify your answer to each question. This justification can involve hand calculation or providing relevant interpretation of output from jamovi and/or statstar.io. If a question requires a calculation, please show your working. If a question requires output from statstar.io, please provide and refer to this output accordingly. Do not simply copy and paste jamovi or statstar.io output, but provide concise interpretation of this output where appropriate. Your assignment does not need a cover sheet. There is a template you can use to help structure your assignment on Blackboard. The majority of your assignment should be typed however if you need to hand draw anything you can include this in your document by scanning or taking a photo of your working. It is best to submit your assignment as a PDF to ensure that it successfully uploads. If your assignment has excessive hand writing in it you may experience problems uploading it. Make sure you receive a receipt for submission and record or save the information in it. It is your responsibility to ensure that your assignment has been uploaded properly. If you experience problems submitting your assignment, email [email protected] a copy of the completed assignment before the due date and outline the problems you are having with the submission. The mark for an assignment submitted after the due date and time, without an approved extension of time, will be reduced by 10% of the possible maximum mark for the assessment item for each day or part day that the submission is late. Note: this applies equally to week and weekend days. Given the need to make the solutions made available to students, assignments will not be accepted 7 calendar days beyond the due date.

1

Question 1.

[9 marks]

gambusia.omv contains measurements of mosquito fish (Gambusia Holbrooki ) randomly collected from an artificial pond on the Ourimbah Campus during March 2000. The fish were collected by 9 groups of 3rd year university students and returned to the lab for determination of sex, length (cm) and weight (grams). Mosquitofish are small, highly aggressive fish that were introduced to Australia by Health Authorities in the 1920s. They were imported as a biological control for mosquitoes, hence the name. However, they eat a wide variety of foods, including mosquito larvae, and are currently widespread throughout Australia. The mosquitofish is now considered a pest species and a major threat to Australian native fish species. (a) [2 marks] Produce a graph and appropriate summary statistics, then write a short paragraph to describe the distribution of Length of the fish. (b) [2 marks] Produce a graph and appropriate summary statistics, then write a short paragraph to describe the distribution of Weight of the fish. (c) [2 marks] Using an appropriate graph, investigate if there is a relationship between the Length and Weight of the fish. (d) [3 marks] Using an appropriate graph and summary statistics, investigate if there are any differences in the Length of Male and Female Gambusia Holbrooki. Does there appear to be any difference in the spread?

Solution: (a) A plot of the distribution of fish length is shown in Figure 1. This distribution appears to be roughly symmetric, or it may be slightly skewed to the right. The values are centred near a mean of 2.14cm and a median of 2.12cm, with standard deviation of 0.335cm and an Inter Quartile Range of 0.34cm. [1 mark] for plot [1 mark] for correctly identifying the three S’s (b) A plot of the distribution of fish weight is shown in Figure 2. This distribution appears to be skewed to the right. Due to the skewness present in the distribution, the mean and median are presented as measures of centre. Similarly standard deviation and IQR are provided as measures of spread. The values are located near a mean of 0.182g and a median of 0.165g, with standard deviation of 0.0959g and an Inter Quartile Range of 0.085g. [1 mark] for plot [1 mark] for correctly identifying the three S’s (c) The scatter plot in Figure 3 shows that there is clearly a relationship between length and weight. It is a non-linear relationship that is positive. As length increases we expect an increase in weight. [1 mark] for plot [1 mark] for correctly identifying non-linear and positive 2

Figure 1: Plot and summary statistics of fish length.

3

Figure 2: Plot and summary statistics of fish weight.

4

Figure 3: Scatterplot of weight against length. Extension: The relationship between length and weight is curved because it follows an exponential relationship, weight increases as a function of a power of length (because the fish are 3 dimensional and stay a similar shape). You can see this by plotting the log of weight against log of length and noting the linear relationship. (d) Looking at either the vertically aligned histograms or the side-by-side box plots in Figure 4, it appears that these are located at roughly the same values. The means for each are 2.13cm and 2.17cm. This difference may simply be due to sampling error. There appears to be a difference in variation with females being more variable in length than males. The sample standard deviations are 0.387cm and 0.237cm respectively. [1 mark] for plot [2 marks] for comments on location and spread Note: either plot is fine

5

Figure 4: Plots and summary statistics for investigating a difference in length between sex.

Question 2.

[12 marks]

According to the Australian Bureau of Statistics, 6.2% of Australians are depressed. An estimated 2.5% of Australians are mothers with at least one child under 2 years old. In total, 0.5% of Australians are mothers with at least one child under 2 years old and are depressed. (a) [5 marks] Using this information, construct a contingency table of probabilities of depression and being a mother of at least one child under 2 years old. (b) [2 marks] What is the probability that a randomly selected Australian is depressed or is a mother of at least one child under 2 years old? 6

Motherhood Young mother (M ) Not a young mother (not M ) Total

Mental Health Depressed (D) Not Depressed (not D) 0.5% 2.0% 5.7% 91.8% 6.2% 93.8%

Total 2.5% 97.5% 100%

(c) [3 marks] If a randomly selected Australian is a mother of a child under 2 years old, what is the probability that she is depressed? (d) [2 marks] For Australians, is being depressed independent of being a mother of a child under 2 years old? Explain your answer.

Solution: (a) Let “young mother” represent a mother of a child aged 2 years or less. [1 [1 [1 [1 [1

mark] mark] mark] mark] mark]

for for for for for

2×2 structure of table 0.5% in the depressed/young mother cell 6.2% in marginal depressed cell 2.5% in marginal young mother cell all other percentages

Comment: It is also possible to construct a contingency table with mental health status forming the rows and motherhood status forming the columns. (b) P (D or M) = P (D) + P (M) − P (D and M) = 0.062 + 0.025 − 0.005 = 0.082 [1 mark] for addition rule [1 mark] for correct answer (c) P (D and M ) P (M) 0.005 = 0.025 = 0.2

P (D|M) =

[1 mark] for P (D|M) [1 mark] for conditional probability formula [1 mark] for correct answer (d) D and M are not independent, because: P (D|M) = 0.2 6= 0.062 = P (D) P (D and M) = 0.005 6= 0.00155 = P (D)P (M) [1 mark] for identifying that D and M are not independent [1 mark] for either equation (1) or (2) 7

(1) (2)

Question 3.

[12 marks]

A company that markets user assembled furniture sells a computer desk that is advertised with the claim “less than an hour to assemble”. The company through their own testing have found that the assembly times follow are normal distribution with mean and standard deviation of 77.45 and 25.87 minutes respectively. (a) [2 marks] Using this information, find what proportion of customers will complete the build in less than one hour. (b) [2 marks] One way the company could solve this problem would be to change the advertising claim. What assembly time should the company quote in order that 60% of customers succeed in finishing the desk within that time? (c) [2 marks] Wishing to maintain the “less than an hour to assemble” claim, the company hopes that revising the instruction and labelling the parts more clearly can improve the 1 hour success rate to 60%. If the standard deviation stays the same, show using the formula for transforming normal random variables that the new mean time will need to be 53.455 minutes. (d) [3 marks] Assume that the company improved their plans for building the furniture and achieved a mean time of 53.455 minutes. A consumer advocacy group wishes to conduct their own testing and randomly selects 10 people to assemble the desk. It is recorded whether each person is able to complete the build within an hour or not. What type of distribution will the number of people who complete the build on time in this sample follow if the company’s claim is true? Explain your reasoning. (e) [3 marks] Suppose that of the 10 randomly selected people asked to assemble the desk only 4 managed to complete the assembly within 1 hour. One member of the team conducting this test says that clearly the company’s claim, that 60% of people can complete the assembly within 1 hour, is incorrect as 4 out of 10 is not 60%. Another team member is not so sure. What do you think? Does this test provide sufficient evidence to refute the company’s claim? (Hint: If the company’s claim is correct, what is the probability that of the 10 people selected, four or less completed the assembly within the hour?)

Solution: (a) Let X represent assembly time in minutes. We assume assembly time follows the normal probability model, X ∼ N (77.45, 25.872 ). We are interested in P (X ≤ 60). This is presented in Figure 5.

8

Figure 5: Statstar output showing P (X ≤ 60). Therefore, only 25.00% of customers will be able to build the furniture in less than 1 hour. [1 mark] for output or working [1 mark] for presentation of the answer (b) Now we are interested in P (X < x) = 0.60 where X ∼ N (77.45, 25.872 ). The Statstar output in Figure 6 shows that x = 84.004, therefore, 60% of customers will be able to complete the build in less than 84 minutes.

Figure 6: Statstar output showing how long it should take 60% of customers to build the furniture. [1 mark] for output or working [1 mark] for presentation of the answer (c) We are now interested in determining what the mean will be of a new distribution, lets call it X2 . We know that this will have the same standard deviation and we will assume it will remain normal. Therefore we have X2 ∼ N (µ, 25.872 ). We want to find the mean of this distribution µ, based on 60% of customers being able to complete the build in less than 60 minuted. For a standard normal, P (Z < 0.253) = 0.60.

9

Figure 7: Statstar output showing P (Z < z) = 0.60. We can use the above information together with the formuala to transform a standard normal to solve for the mean of X2 . x2 − µ σ 60 − µ 0.253 = 25.87 6.543 = 60 − µ z=

µ = 53.455 Therefore the mean of the new distribution will be 53.455 minutes as required. This result can be confirmed in Statstar.

Figure 8: Statstar output showing P (X2 < 60) = 0.60 for X2 ∼ N (53.455, 25.872 ). [1 mark] for finding the z-score [1 mark] for using the transformation formula Note: can simply manipulate the mean in Statstar

Must use the transformation formula,

(d) In this question we are interested in a different random variable. Let Y represent the number out of the 10 people asked who successfully assembled the desk within 1 hour. For this random variable we have there is a fixed number of trials, n = 10. Each of those trials has two possible outcomes, success or failure. The probability of success is constant and if the company’s claim is correct, p = 0.60. Assuming that the people don’t do the task together or are allowed to talk about it then it is reasonable to assume that the trials are independent. 10

Thus we can use the binomial probability model where Y ∼ B(n = 10, p = 0.6). [1 mark] for identifying the variable as Binomial [2 marks] for testing the assumptions (e) Using Statstar we find that P (Y ≤ 4) = 0.166. This is presented in Figure 9.

Figure 9: Statstar output showing P (Y ≤ 4) = 0.166 for Y ∼ B(n = 10, p = 0.6). Thus, if the company’s claim is correct, there is about a 17% probability that only 4 or fewer of the 10 people in our trial will complete the desk assembly in an hour. This is about a 1 in 6 chance - not particularly likely but probably not so unlikely that you would be prepared to claim that the company is lying in its advertising. [1 mark] for identifying we want P (Y ≤ 4) [1 mark] for correct calculation of the required probability [1 mark] for appropriate interpretation of this probability to answer the question

11...


Similar Free PDFs