Statistics Notes

Lecture 3 - Hypothesis Testing The hypothesis has two components: -

Null hypothesis: no difference/change in results i.e., nothing (H0)


Alternative hypothesis: difference/change in results i.e., something has happened (H1)

Valid bases for forming hypotheses: -

Intuition – based on opinion, faith, belief, or feelings (common sense)


Authority – knowledge about behaviour from an expert or trustworthy source


Rational induction – based on the combination of facts


Empirical science – knowledge about behaviour tested and confirmed via the scientific method o

Only valid method for testing hypothesis

Categorical vs Numerical Methods: -

Empirical research methods (gathering numeric, measurable data)


Categorical (qualitative) – data gathered is descriptive (nominal or ordinal)


Numerical (quantitative) – data gathered is numeric, analysed using quantitative analytic methods (interval or ratio)


Can be mixed method – complementary approaches for complementary information

Categorical Nominal (numeric) data: -

The count or number (freq) in each category


Proportion or percentage in each category



For one variable, tabulate variable For multiple variables, tabulate variable1 variable2 Ordinal (graphical) data: -

Plot the number or percentage/proportion in each category using a bar chart o X-axis = categories o Y-axis = count/percentage/proportion

2 Frequency in y-axis = graph bar (count), over(variablename) Percentage in y-axis = graph bar (percent), over(variablename)

Lecture 4 – Measurement and the Central Limit Theorem Measurements – the data taken on subjects in a study according to the variables of interest Numerical and graphical summaries

One categorical variable: Numeric: -

Use frequency tables showing percentage or proportion in each category of a nominal or ordinal variable


tabulate variablename

Graphical: -

Bar charts / pie charts showing frequency / percentage of observations in each category


graph bar (count), over(variablename)


graph bar (percent), over(variablename)


graph pie, over(variablename)

3 One numerical variable: Numeric: -

Calculation of summary or descriptive statistics o Mean or median (for the centre) o SD or IQR (for spread) o Variance and range


tabstat variablename, statistics(mean, sd, range, median, iqr)

Graphical -

Histogram o histogram variablename


Boxplots o graph hbox variablename o median = central line o IQR = edges of the box o Range = lowest and highest points

Bivariate Summaries Most studies are interested in the relationship between two (or more) variables. -

Outcome of interest (hypothesis) = dependent variable (DV)


Independent variable (IV) used to predict outcome

Two categorical variables: Numerical: -

Tabulate variable1 variable2, row


Cross-tabulation of one categorical variable by another (contingency table)

Graphical: -

graph bar (percent), over(variable1) over(variable2)


graph bar (count), over(variable1) over(variable2)


plot the count / percentage of a categorical variable within each category of another categorical variable

4 Two numerical variables: Numeric: -

Pearson correlation – a numerical summary that describes the strength and direction of the linear relationship between two variables.


r = sample correlation


𝜌 = population correlation

Correlation formula: 𝑐𝑜𝑣(𝑥, 𝑦) 𝑆𝑥𝑆𝑦 -

Variability within each variable and variability between the two variables (covariance)


correlate variable1 variable2

Graphical: -

Scatterplot o Shows relation between two variables o Each point is an x-y pair from the same individual o Can show positive, negative relationship, no relationship or non-linear relationships


scatter variable1 variable2

One categorical and one numerical variable: Numerical: -

Calculate descriptive statistics for each category


Compare means, medians, variability (SD, variance, IQR, range) statistics


by categorical_variablename, sort: tabstat numeric_variablename, statistics(mean, sd, range, median, iqr)

Graphical: -

Comparative boxplots


Compare descriptive statistics + gaps, outliers.


graph box numeric_variablename, over(categorical_variablename)

5 Correlation

Pearson’s correlation coefficient: -

ranges from -1.00 to +1.00


stronger correlation further from 0 0 − ±0.10

Very weak-to-no relationship

±0.10 − 0 ±0.30

Weak relationship

±0.30 − 0 ±0.50

Moderate relationship

±0.50 − 0 ±1.00 -

Strong relationship

Correlation is both a test statistic and a measure of effect size


Check aspects of scatterplot o Monotonic (does the trend keep in one direction) o Linear o Direction of association (positive or negative) o Gaps(?) o Outliers(?)


If data checks out o Comment on direction of association o Calculate and comment on correlation strength


y-axis = DV, x-axis = IV

Statistics and Parameters: Sample Statistics



x4 0(M)



xd (Mdn)


Std dev

𝑠 (SD)




Population Parameters 𝜇

𝜇 -tilde


𝑠²0000000000000000000 𝜎² 0 à00000000000000

6 Summary

Numerical summaries (and Stata code) DATA categorical



contingency tables

mean, median, SD,

frequency tables

(count, %)

IQR, variance by

(count, %)

group numerical

mean, median, SD, IQR, variance by

mean, median SD, correlation

IQR, variance

group frequency tables

mean, median SD,

(count, %)

IQR, variance

Graphical summaries DATA




clustered bar charts

comparative box

bar charts or pie



scatter plots



comparative box plots bar charts / pie



Sampling distributions -

Hypothesis assumes sampling distribution of the mean is normal


Large enough sample size = normally distributed sample means regardless of population shape


This is the central limit theorem


Variability (called standard error = SD of sample mean) decreases as sample size


increases 𝑠 𝑆𝐸 = √𝑛

Used when making assumptions that relate to normality

7 Lecture 5 - One-Sample Tests The process: 1. Null and alternative hypothesis based on known or postulated beliefs about a population 2. Choose significance level for the test (𝛼, i.e., alpha). Usually 5% (𝛼 = 0.05) 3. Choose appropriate test statistic and use sample to check the assumptions for that test 4. Calculate test statistic using sample data 5. Calculate probability of having a test statistic as extreme or more extreme than the one you found (p-value) if the null hypothesis is true 6. Make verdict about null hypothesis: if the p-value ≤ 𝛼 we reject H0, if the p-value is > 𝛼 fail to reject H0 7. Write a conclusion about your research question

Test statistics are the standardised difference between the null value and the sample statistic. We standardise the difference by dividing it by the standard error. M = mean of sample 𝜇 = 𝑚𝑒𝑎𝑛

𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑0𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 z-score: 𝑧 =

!"#$%&$'(#)!%$*(+ ,


standard error (SE) = √. n = sample 𝜒2 = chi-squared Σ = sum

One-sample z-test for a mean: USED WHEN: -

Data is numeric


Sample is drawn from a normal distribution (graph sample data if not given)


Observations are independent


Must be given 𝜎

All assumptions must be met



𝑀 − 0𝜇 𝜎 √𝑛

ztest variablename == mu0, sd(sigma)

z close to 0 = big probability, z far from 0 = small probability small p-value = very unlikely = assume H1

One-sample t-test for a mean: USED WHEN: -

We don’t know 𝜎 o We estimate it using the sample standard deviation (s). o When 𝜎 is unknown, t-test statistic with v = n – 1 degree of freedom (df)


Scores are numeric


Scores are approximately normally distributed, not too skewed


Observations are independent 𝑀 − 0𝜇 𝑠 √𝑛


ttest variablename == mu0

We use a one-sample t-test for a mean when we DO NOT know 𝜎 , the population standard deviation.

APA format: A one-sample t-test was conducted to determine . Results indicate that (M, SD), t (test statistic – df) = x, (no 0 before decimal point).

Chi-squared goodness of fit test USED WHEN: -

Scores are categorical

9 o We want to ask questions about the number or proportion in each category of that categorical variable §

E.g., are 50% of participants female


Are the proportions of each category equal?


Does sample match with relevant statistics

o Asking whether proportions of observations across categories is different to some known (expected) proportions -

Observations are independent


Expected frequency in each category is at least 5

In Chi-squared goodness of fit tests we hypothesise about the proportions in each category (𝑂𝑖 − 𝐸𝑖)𝑠𝑞𝑢𝑎𝑟𝑒𝑑 𝜒2 = 0Σ 𝐸𝑖

Where Oi = observed value for category i Ei = expected value for category i Square the difference between each observed and expected value and divide by the expected value To find p-value: Display chi2tail (df, test_statistic) Stata has no inbuilt command for performing the whole test, so do this instead: findit csgof -> click link to install package

then type csgof categorical_variablename, expperc (perc1, perc2, etc.) replacing perc1, perc2, etc. with the expected (hypothesised) percentage in each category

APA format: A chi-squared goodness-of-fit test was conducted to determine whether . There was/was no evidence that , 𝑥2(df, sample size) = , .

Lecture 6 – Non-experimental Data -

Arise when the researcher simply observes the subject or object under investigation, they don’t apply any intervention

10 -

Looking for associations / relationships between a dependent and independent variable(s)


If found, we can only describe associations, we cannot know for sure that there is a causative effect

Shapiro-Wilk test for normality USED WHEN: -

Population from which the sample is drawn is normally distributed


Numeric variables only


Used to see if the assumption of normality is met, IN CONJUNCTION with a graph and/or numeric descriptive statistics


H0 = normally distributed


H1 = not normally distributed


We don’t want a significant result

In Stata: Graph form: histogram variablename Table form: swilk variablename p-value = Prob>z category

Two-sample t-tests of means USED WHEN: -

Comparing two independent groups


DV is numeric


DV is normally distributed in both groups (use histograms and Shapiro-Wilk)


Approximately equal variance between the two groups (check with Levene’s test)


Observations are independent (within and between groups)

Hypothesis: H0:𝜇1 = 0𝜇2vs H1: 𝜇1 ≠ 𝜇2

Use 𝛼 = 5% significance level

11 Test statistic is: 𝑥−𝑦 𝑡= 1 1 𝑆𝑝Q 𝑛1 + 𝑛2 Where x and y = sample means of DV in groups 1 and 2 n1 and n2 = sample sizes in groups 1 and 2 Sp = pooled standard deviation

𝑆𝑝 = √

𝑎(𝑛1 − 1) + 𝑏(𝑛2 − 1) 𝑛1 + 𝑛2 − 2

Where a and b = sample variances in groups 1 and 2 Step 1: check equality of variability (swilk) by categorical_variable1, sort: summarize variable 2

Step 2: check normality of each group histogram variablename, by(categorical_variablename) freq

Step 3: test equality of variances (Levene’s test) robvar variablename, by (categorical_variablename)

Simple method using Stata: ttest variablename, by(categorical_variablename)

If p-value ≤ 0.05, we reject the null hypotheses. APA format: A two-sample t-test for the difference in mean between and indicated that .

Significance levels: Why 5%? Used to minimise chance of Type I error, which happens if: -

We reject the null hypothesis when we shouldn’t (false positive result)


Accept the alternative hypothesis when we shouldn’t


When sample effect is due to chance

12 Type II error: -

Not rejecting null hypothesis when we should (false negative result)


Rejecting alternative hypothesis when we shouldn’t


Sample isn’t detecting a real population effect

Power -

Probability that we correctly reject the null hypothesis


Influenced by: o Significance level – as 𝛼 increases power also increases

o Sample size – as 𝑛 increases power increases, because the SE decreases o Variability in the DV – more variable = harder to reject H0 o Magnitude of difference between hypothesised and true values – small difference = less power (harder to reject H0), big difference = more power (easier to reject H0)

Lecture 7 – Experimental Data

Background -

Arise when the researcher applies an intervention to the subject or object under investigation


Aim to find causal relationships between a dependent and independent variable(s)


More validity than observational studies, controlled environment prevents chance correlations

Two-sample t-test of means We test H0: 𝜇1 = 𝜇2 versus H1: 0𝜇1 ≠𝜇2

Significance level 𝛼 = 5% Steps to complete:

1. Check for normality (Shapiro-Wilk) by categorical_variablename, sort: swilk numeric_variablename

2. Check for equality of variances (Levene’s test) robvar numeric_variablename, by(categorical_variablename)

13 3. Find test statistic and p-value ttest numeric_variablename, by(categorical_variablename)

If p-value ≤ 0.05, we reject null hypothesis. APA format: Researchers concluded that .

Experimental studies allow us to make firmer conclusions and recommendations

Paired t-test Test conducted on participants of the same group, rather than two separate groups. -

All subjects receive both conditions but in random order OR matched pairs (couples, twins, age, sex etc.) where random member of pair receives intervention

We test H0: 𝜇𝑑 = 0 versus H1: 0𝜇𝑑 ≠0

𝜇𝑑 = mean of differences in the population Assumptions made: -

Differences are numeric


Sample of differences are approximately normally distributed


Differences are independent

t-test statistic: /'*(+'


!" √$"

with degrees of freedom 𝑑𝑓 = 𝑛𝑑 − 1

In Stata: ttest variable1 == variable2


Provides mean, SD, t-test statistic, df and p-value

Paired t-tests vs one-sample t-tests -

Same formula and degrees of freedom


Only difference is that we know the differences came from paired observations

14 -

Paired t-test shows scores of both variables, one-sample t-test shows difference in scores

Confidence intervals for means

Interval estimate gives us a range of believable values for the parameter. We call the interval / range of believable values a confidence interval Samples from the same population tend to have differing results due to sample size. -

95% for STAT1103


Means that if multiple samples are taken from a population, about 95% will contain the true parameter value (𝜇 )


Confidence level and significance level ( 𝛼) are inverse of each other


𝑆𝑎𝑚𝑝𝑙𝑒0𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 ± 𝑐𝑟𝑖𝑡𝑖𝑐𝑎𝑙0𝑣𝑎𝑙𝑢𝑒 × 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑0𝑒𝑟𝑟𝑜𝑟0


Critical value = value that cuts off 5% in both tails of the relevant distribution (z or t)

We test H0: 𝜇 = M versus H1: 0𝜇 ≠M Significance level 𝛼 = 5%

Steps to complete (if 𝜎 is known): 1. One-sample z-test 𝑧=

/*+ % √$

2. Rearrange to create interval formula ,

𝑀 ± 𝑧𝑐𝑟𝑖𝑡 × √. In Stata: ztest variable==M, sd( 𝜎) Steps to complete (if 𝜎 is not known): 1. One-sample t-test 𝑡=

/*+ & √$

2. Rearrange to create interval formula 𝑀 ± 𝑡𝑐𝑟𝑖𝑡 ×



15 In Stata: ttest variable == M

Steps to complete for paired t-test 𝑀𝑑 ± 𝑡𝑐𝑟𝑖𝑡 ×

#' √.'

In Stata: ttest variable1 == variable2 p-value not significant = interval includes 𝜇 Steps to complete for two-sample t-tests 𝑡=


#3 4 5 $ $ '




with df = 𝑛1 + 𝑛2 − 2

Formula rearranged to give a 95% confidence interval for 1

𝜇1 − 𝜇2: (𝑥1 − 𝑥2) ± 𝑡𝑐𝑟𝑖𝑡 𝑠𝑝Q + . × 1

1 .2

Significant p-value = interval does not include 0 ( 𝜇1 − 𝜇2 = 0) Correlation test pwcorr variable1 variable2, sig

gives us correlation and p-value. Interval would help see variability in estimated correlation. ssc install ci2 ci2 variable1 variable2, corr

Summary Hypothesis test One-sample z-test One-sample t-test


Confidence interval


𝑀 ± 𝑧𝑐𝑟𝑖𝑡 ×




𝑀 ± 𝑡𝑐𝑟𝑖𝑡 ×


% √$

& √$

Paired t-test

𝑡𝑑 =

/' *+'




t-test Correlation

&" √$"

#34 5 $' $(




%√.*2 √1*%2



