PYM0S1 0 preliminary assessment answer PDF

Title PYM0S1 0 preliminary assessment answer
Course Psychology
Institution University of Reading
Pages 7
File Size 182.6 KB
File Type PDF
Total Downloads 5
Total Views 145

Summary

The answer section for the statistics test run by Prof Murayama...


Description

Model answers of PYM0S1-2, Preliminary statistics assessment NOTE: These are just model answers and there may be better answers for each question --- if you are unsure about your own answers, please feel free to contact Kou Murayama ([email protected]). This assessment indicates the level of statistical knowledge that you should have at the start of the course. If you submit the assessment in Week 0, it will be returned with answers during the practical workshop in Week 1. SECTION A. Basic test applications A health survey is administered to a sample of 100 working adults. The items which each person is asked to complete, include the following: Item

Options

Measurement scale (choose one)

a) SEX

Female / Male

categorical/ordinal/interval

b) WEIGHT in kg

categorical/ordinal/interval

c) HEIGHT in cm

categorical/ordinal/interval

d) EDUCATION

Until age 16 / Until age 18 / Bachelor's Degree / Postgraduate

categorical/ordinal/interval

e) AREA OF RESIDENCE

Scotland / North England / Wales / Midlands / South England

categorical/ordinal/interval

f) CIGARETTES SMOKED PER DAY - NOW

categorical/ordinal/interval

g) CIGARETTES SMOKED PER DAY – ONE YEAR AGO

categorical/ordinal/interval

h) SUFFERED HEART DISEASE IN LAST 2 YEARS?

Yes / No

categorical/ordinal/interval

A1. (i) For each of the survey items (a)-(h), which of the 3 following scales of measurement does it represent (highlight one in 3rd column of table): (ii) Imagine the distributions of the variables weight and number of cigarettes smoked. a. Which distribution do you think would be likely to be more skewed (i.e. not normal)? Cigarettes b. Provide brief reason(s) for your answer There should be a number of non-smokers; As a result, the distribution should have a pile of zeros, producing a positively skewed distribution. c. You computed the mean of that skewed variable. Briefly describe possible caution(s) to interpret the computed mean. Mean tends to be strongly influenced by extreme values and may not provide a good representative value of the number of cigarettes smoked. In this particular case, median or mode may give you a more reasonable index of central tendency.

A2. Here is a list of statistical significance tests: Chi-square test Spearman's rho test Pearson's r test Mann-Whitney U test Independent sample t test Paired sample t test Wilcoxon signed-rank test F test with One-way analysis of variance F test with Two-way analysis of variance F test with Regression analysis Choose a test from the list, to answer each of the questions (i)-(viii), and say briefly why this test is appropriate. NB, for some questions, there might be more than one possible test. Example question: Do males and females differ in height? Answer – independent sample t-test, because we want to compare means from 2 samples (males and females) comprising different, unpaired subjects. (i) Is there an association between height and weight? Compute Pearson’s correlation coefficient and use Pearson's r test, because we want to know the relationship between two continuous variables. If the distributions are heavily skewed, Spearman’s rho test may be better to minimize the effects of outliers. Regression analysis can also examine the association between the two continuous variable. (ii) Do people living in different areas differ in weight? F test with One-way analysis of variance with weight being DV and areas being IV, because there is only one factor (Area) consisting of more than 2 levels. (iii) Are males (compared to females) more likely to have suffered heart disease in the last 2 years? Chi-square test, because we are interested in the association of two categorical variables. (iv) Have people in the population decreased their smoking over the last year? Paired sample t test because we want to compare the means of two conditions (# cigarettes now vs. # cigarettes last year), to which the same participants responded. If the distribution of the change in the number of cigarettes is strongly skewed, Wilcoxon signed-rank test may be better. (v) Is there an association between the level of education and the current smoking behaviour (cigarettes smoked per day)? Compute Spearman's rank correlation coefficient, because one variable (i.e., education level) is an ordered variable and the other variable (# cigarettes) is likely to be skewed. (vi) Have the experiences of heart disease (in last 2 years) influenced people’s weight differently for males and females? F test with two-way analysis of variance with weight being DV and heart disease and gender being IVs, because there are two independent categorical factors (heart disease and gender) you want to examine.

2

(vii) Do males have higher levels of education (in comparison to females)? Mann-Whitney U test, because you are interested in the comparisons of two independent groups (males vs. females) with regard to an ordered DV. (viii) Is the current smoking behaviour associated with weight even after controlling for height? F test with Regression analysis with weight being DV and current smoking behaviour and height being IVs, because you are interested in the relationship between an IV (smoking behaviour) and the DV (weight) after controlling for the other IV (height). # Partial correlation analysis that controls for height can also address the research question.

3

SECTION B. The Normal Distribution and the z-statistic A test of children's reading age is designed so that, in the general population of 10-year-old British children, scores are normally distributed with a mean of 10 years and a standard deviation of 2 years. The following questions refer to 10-year-old British children, assessed with that test. B1. Three randomly chosen individual children are tested and found to have reading ages as follows. What z-scores or standardized scores correspond to these individual results? 1)

6 years

z= -2.0

2)

11.5 years

z= 0.75

3)

13.5 years

z= 1.75

B2. Use the z (normal distribution) table (or a software package) to say what is the probability that a randomly sampled child has a reading age as follows: 1)

6 years or lower

p=0.02

2)

11.5 years or lower

p=0.77

3)

13.5 years or higher

p=0.04

4)

Between 11.5 and 13.5 years

p=0.19

# (4) The probability of being 11.5 years or lower is 0.77. The probability of being 13.5 years or lower is 0.96. Therefore, the probability of being between 11.5 and 13.5 years is 0.96-0.77=0.19

4

SECTION C. Correlation You collected data from 500 school children aged between 11-18. The data included height (in cm) information and the number of words they can read within 1 minute (i.e., reading efficiency). When you analysed the data, you found a strong positive correlation ( r = 0.5) between height and reading efficiency, indicating that taller children tend to exhibit better reading efficiency. (i) The data also had another height variable reported in inches, rather than centimetres. This variable was simply converted from the height variable in centimetres (1 inch = 2.54cm). Out of curiosity, you computed the correlation between this height variable (in inches) and reading efficiency. Do you expect the correlation would become higher than 0.5, lower than 0.5, or unchanged? Or do you think it is not possible to predict the results for certain? Provide your answer, and say briefly why you think so.

The correlation would be unchanged, because correlation coefficients would not be influenced by linear transformation of variables.

(ii) Instead of correlation analysis, you conducted a regression analysis with height (cm) being the independent variable and reading efficiency being the dependent variable. Do you expect the regression coefficient would be positive, negative or zero? Or do you think it is not possible to predict the results for certain? Provide your answer, and say briefly why you think so. The regression coefficient would be positive, because, when there is only one independent variable, regression coefficient is simply a scaled correlation coefficient.

(iii) From the original correlation analysis, can you conclude that height causally facilitates reading efficiency? Provide your answer, and say briefly why you think so.

No, because correlation does not necessarily speak to causation. For example, age is likely a plausible third variable that produced the spurious positive relationships between height and reading efficiency, because older children should be taller and better reading efficiency.

5

SECTION D. Basic parametric ANOVA, 1-way and 2-way 

All questions should be answered in terms of classical parametric ANOVA

C1. The numbers below are hypothetical memory scores of 3 groups of children. There are two datasets (Dataset A and Dataset B). Both datasets have the same mean scores for 3 groups. Dataset A Group A: 3, 5, 4, 4, 2 Group B: 2, 3, 2, 1 Group C: 5, 6, 4, 6

(n=5, mean: 3.6) (n=4, mean: 2.0) (n=4, mean: 5.25)

Dataset B Group A: 1, 7, 3, 5, 2 Group B: 4, 1, 2, 1 Group C: 2, 9, 1, 9

(n=5, mean: 3.6) (n=4, mean: 2.0) (n=4, mean: 5.25)

(i) You want to test for a significant difference among the means of the 3 groups. Here the appropriate statistical test is one-way ANOVA. Which datasets do you think would be more likely to produce statistical significant effect of groups? Provide your answer, and say briefly why you think so.

Dataset A. The means are identical but Dataset A seems to have smaller variance (i.e., SD) within each group. You are more likely to obtain statistically significant effects in between-subjects ANOVA when you have smaller within-group variance.

(ii) Why would it NOT be a good idea to do 3 two-sample tests to test your research question (Group A vs. Group B, A vs. C, & B vs. C)?

This is because the repetition of statistical tests (i.e., t tests in this case) would inflate Type-1 error rates. As a result, you will be likely to have more false-positive findings than the conventional significance level (0.05).

6

C2. The results below from a political survey show "liberalism" scores of 4 groups of people living in urban and rural communities in France and Italy. French Urban: French Rural: Italian Urban: Italian Rural:

3, 5, 8, 7 2, 6, 3, 5 4, 7, 3, 4 3, 2, 3, 1

i) These data could be analysed by a 2-way ANOVA. Provide a name for each variable, and say whether it is between-subject or within-subject: Variable 1:

Urbanicity (urban vs. rural) – between subjects

Variable 2:

Country (France vs. Italy) --- between subjects

ii) Compute the cell means, and the marginal means for each variable.

French Italian Marginal means

Urban 5.75 4.50 5.125

Rural 4.00 2.25 3.125

Marginal means 4.875 3.375

iii) Sketch a graph (e.g., using Excel) of the cell means which would allow you to judge whether there is an interaction between the 2 factors.

The slopes are not perfectly parallel, indicating a very small interaction between the two IVs. However, the differences in slopes are such small that this interaction is very unlikely to reach statistical significance.

7...


Similar Free PDFs