Assumed knowledge - Statistics PDF

Title	Assumed knowledge - Statistics
Author	Shannon Salama
Course	Design and Statistics II
Institution	Macquarie University
Pages	35
File Size	2 MB
File Type	PDF
Total Downloads	20
Total Views	143

Preview

CLICK TO PREVIEW PDF

Summary

Assumed knowledge to complete design and statistics II...

Description

1

PSYC1 PSYC105 05 – Introduction to Psycholo Psychology gy sta statistics tistics strea stream m

¾

PSYC105 is assumed knowledge for PSY248

¾

These notes cover only the PSY248 revision content (assumed knowledge) provided through the PSY248 iLearn in S2 2018, recorded for PSY105 in S2 2017. So please note that they are incomplete for PSYC105; Weeks 10, 12, and 13 were not included in the PSY248 access package, and weeks 8 and 11 are omitted from these notes as they are revision lectures.

¾

Stata content is indicated in blue

Week 1: Introduction / Why Statistics? / Support Week 2: Introduction and Research Design Samples and Populations IV vs DV / Descriptive vs inferential statistics Example study – inference Week 3: Introduction to iLab and STATA Data input in Stata Stata Walkthrough Week 4: Summarising Data / Categories / Typicality and Variability Numeric summaries: shape Week 5: Fundamental Concepts / Normal distribution theory Hypothesis testing Statistical inference Week 6: T-Tests Part 1 / One sample t-tests

One-sample t-test in Stata Independent samples t-tests / Levene’s test in Stata Independent samples t-test in Stata Week 7: T-Tests Part 2 Viewing t-test data in Stata Shapiro-Wilk test of Normality / swilk in Stata / Manual paired sample t-test Running a paired t-test data in Stata Confidence intervals Effect sizes Week 9: Correlations / Introduction to correlation and correlation analysis Aspects of scatterplot analysis Research context example, reading a scatterplot Calculating correlation coefficient Appendix: Notation symbols key, Stata command list, Helpful YouTube links

2 3 4 5 6 7 8 9 12 14 16 17 18 20 21 22 23 24 25 26 27 28 29 30 31 32 33 35

2

PSYC105 Statistic Stream, Week 1: Why statistics?

Week 1, Video 1: Introduction and Administration -

Gain access to the Stata program. You can access Stata in iLab for free at https://wiki.mq.edu.au/display/iLab/About or can be accessed on the computer labs C5C 218, 219 and library.

Week 1, Video 2: Why stats? -

Why am I studying numbers to understand feelings? Statistical methods are essential to the scientific (formal) quantitative study of psychology.

-

If we have a theory or predication about how people work/think/interact with the world etc. to test, we need formal research methods for consistency and help avoid the biases that anecdotal evidence can lead to. Questions such as “Are people with chronic pain more prone to develop depression?” require data to answer

-

Psychology is usually interested in people on a large scale (E.g. adolescents as a group to generalise vs a teenager). But people vary and we can’t measure everyone. “People are complicated” so we use statistics to aggregate and make inferences

Aggregate

summarise across people, to deal with variation across people

Inferences

make generalisations from a small group sample to the greater general population

There is a direct overlap with Week 2 content at this point in the recording. Please see “Research steps” under Week 2 Video 1 for the scientific method and an example study.

Critical thinking -

The process through which you evaluate information and weigh the likelihood the information you are provided is true is an essential skill for academia and extremely useful for application in life. E.g. How we can counter “Fake News”, combating misleading health claims

Statistics Anxiety -

Is really common in psychology students!

-

If you would like to revise arithmetic, algebra, geometry, probability: http://www.khanacademy.org.org/math

Week 1, Video 3: Support -

Ask questions in the discussion forum, make appointments with tutors, many videos on YouTube and additional content will be posted on the iLearn, PAL groups

End of lecture sequence

3

PSYC105 Statistic Stream, Week 2: Introduction to Statistics

Week 2, Video 1: Introduction and Research Design -

PSYC104 covers research design, the lecturer recommends revising these in the Chapter 2 of the Lilenfeld textbook.

-

Statistics are important but they cannot save a poorly designed study.

Research steps – The scientific method 1.

Generate a research question: what are you interested in investigating? (Observations + reading literature + prior research Æ develop a theory.)

2.

Generate hypotheses: what are your predictions?

3.

Operationalise constructs: what are you measuring? How will you measure it (Questions? Machines?)?

4.

Design and conduct the research study: what kind of study design? (How will you collect data to test the hypothesis?)

5.

Collect the data: measure stuff (E.g. numeric data on medications, duration of condition, interventions etc.)

6.

Analyse the data: statistics! To draw conclusions on whether we have evidence for or against a hypothesis

7.

Draw conclusions: were your hypotheses supported or not? What are the limitations of the conclusions? What does this mean for the topic and wider field?

Example study Research question: Is CBT effective in treating Generalised Anxiety Disorder (GAD)? Design a study: RCT (randomised control trial) with anxiety measured pre-intervention, post-intervention, and 2 months later. Testing 200 people with GAD, 100 Control, 100 CBT patients

Quantitative research methods and statistics are needed to: -

Summarise anxiety scores across all participants

-

See whether there is a decrease in anxiety in the CBT group, relative to the control

-

See how likely it is that the decrease we see in the sample can be generalised to a wider population. (If it is likely a real change will be reflected in the wider population)

4

Week 2, Video 2: Samples, populations, data, variables A representative selection from the population from whom you collect data The wider group you’re interested in

Sample Population -

Example of setting population and sample: o

Population = all students in Australia. vs Sample = 500 MQU students. We acquire the data from the sample.

-

Before designing the research study the researcher selects the sample based on the population, the must be representative

-

Research questions are always about the sample, conclusions re always made back to the population

-

Parameter

A numeric summary of the population

Statistic

A numeric summary of the sample

Example parameter vs example statistic: o

Parameter = Average age of Australian university students is 20 years. (numeric summary of population) Statistic = Average age of 100 randomly selected Australian university students = 22 years (represents sample)

-

As researchers we are more interested in parameters, these are what we want to understand. What we have data for is statistical. Unit of Observation Data

The level of which you’re interested in sampling E.g. individual, school, organisation, country A collection of information that had been recorded from your sample. A summary of the sample set up as a spreadsheet

Variable

Piece of information you have collected that varies among participants (E.g. age, gender)

Types of data Quantitative

Numeric information (E.g. age, height, weight, ATAR..)

Qualitative Discrete

Descriptive, Any non-numeric property data (E.g. favourite colour, type of car, workplace..) Numerical categories E.g. Year in school is 1, etc. these are discrete categories, they can’t be in Year 1.3

Continuous

Where a valid point can be anywhere on a scale E.g. Temperature can be 21.62 degrees

Measurement levels Nominal Ordinal Interval Ratio

Unordered, categorical, only attributes are named E.g. gender: male, female, other. (weakest) Qualitative Ordered categorical, attributes ordered E.g. education: high-school, undergrad, postgrad . Qual or Quant Numeric scale with consistent differences between points (distance meaningful) E.g. standardised IQ (if you score 0 it does not mean you have no intelligence, the test is arbitrary). Quantitative Numeric scale with consistent differences between points AND absolute zero E.g. weight in kg (if you weight 0kg you do not have weight). Quantitative

5

Week 2, Video 3: Independent vs Dependent variables Study designs Experimental design Observational Independent variable (IV) Dependent variable (DV) Between-subject Within-subject Measurement error

Researcher has the power to allocate participants to different levels (groups/conditions) of an independent variable (IV) IV can cause change in the DV No experimental control. IV can be associated with or predict a change in the DV Manipulated variable, or which researchers hypothesise will have effect on behaviour Use this to predict of explain or cause a change in the outcome Outcome variable, hypothesised to be affected by the manipulated IV Independent groups (repeated measures) Related groups Potentially a difference between the actual value of a phenomenon and the value of the data we collect about the phenomenon

Example: IV – Number of tutorials attended, DV – Final end of year grade Can’t conclude that tutorial attendance caused better performance, people weren’t randomly allocated – they are observed, could be that those who study more attend more tutorials for example, there are potentially other naturally occurring factors that could explain the relationship

Variables Extraneous variable Confounding variable

Another variable that is not the IV or DV A type of extraneous variable that can potentially explain the relationship between the IV and DV

Example: IV – Age of child, DV – Reading ability, Confound – Year of school It may be that their higher level of education at school is the reason they are better at reading, not because they are older. Experimentally designed studies limit potential confounding variables.

Week 2, Video 4: Descriptive vs inferential statistic Inference Descriptive statistics Inferential statistics

Describe sample only Gather data from a sample and make generalisations (inferences) back to the population

Types of hypothesis Research hypothesis

developed from research question “the more PAL sessions a student attends, the better their final PSYC105 grade”

Statistical hypothesis Null hypothesis Alternate hypothesis

no difference between groups, no relationship, no change, nothing “there is no relationship between PAL attendance and PSYC105 grade DV” a difference between groups, a relationship, a change, something “there is a relationship between PAL attendance and PSYC105 grade” Alternate hypothesis doesn’t explain that relationship, just indicates there is one

Fundamentals of inference statistics Ideally we could sample the whole population, or run a study multiple times to find an average effect of results, but these re impossible so instead we use inferential statistics to be able to see how likely it is that a real effect would be reflected in the population. Process of inference: what is the chance the effect we have obtained in this sample is reflecting an effect in the population?

6

Week 2, Video 5: Example study – inference

Example study to tie in all of the above: “Studying and exam performance”

Research question: what’s the most effective way to study: repeatedly throughout semester or right before the final exam? Research hypothesis: repeated, weekly, study throughout semester will result in better final exam performance than a solid block of studying a week before the exam. Study design: experiment. -

IV: study type (nominal: weekly study vs block study, between-subjects manipulation)

-

DV: final exam performance

-

100 students randomly allocated to each of ’weekly study’ group and ‘block study’ group

Collect data Statistical analysis: on average, did the ‘weekly study’ group perform better than the ‘block study’ group? -

How big is the average difference? How consistent is the difference?

-

Depending on how big and how consistent the difference is, does this difference likely reflect a real difference in the population?

This is a poly-histogram representing the distribution of the

To demonstrate consistency. The above graphs show the

spread of scores. How big is the difference and how much

same mean difference (difference between the two means is

overlap (how consistent) is there? This will tell us whether

the same).

the effect we see is likely representing a real effect on the

The top graph has more overlap between distributions; it has

population

more variability compared to The bottom which is much more distinct; it has a more consistent difference with less variability.

End of lecture sequence

7

PSYC105 Statistic Stream, Week 3: Introduction to iLab and Stata

Week 3, Video 1: Introduction to iLab and Stata -

To access iLab you must download the client app, install and launch it.

-

iLearn about: https://wiki.mq.edu.au/display/iLab/about

-

Client index: https://wiki.mq.edu.au/display/iLab/Clients

-

After launching the VMware Horizon Client, accept the disclaimer and log in to MQAUTH domain with your student ID and password to access the university desktop

-

Open the Stata interface. There is a User’s Guide in the help menu.

Note: Windows and Mac interfaces of the program may look different 

If you accidentally close a section, go to the window menu to turn it back on



Remember when opening files if using the program through iLearn you are using the desktop of a different computer, not your own, so do this online.





Two options to get Stata to action -

Submit commands in Stata’s language into the Command section at the bottom of the screen

-

Use menu at top to instruct

A Do-file editor (pencil paper icon with drop-down arrow) lets you have a list of all the written commands you are using. It is a text window you can run commands from.



Data browser allows you to look at data without changing it, data editor allows editing



To exit iLab, quit the VMware Horizon Client.

8

Week 3, Video 2: Data input in Stata 

When setting up spreadsheets it’s important to consider what format to store this in. There are two types of data storage. Remember different kinds of variables – Qualitative: nominal, Quantitative: integer, ratio Quant/Qual: ordinal



Only numbers are entered into the dataset of numerical-type data. Statistical programs cannot analyse string-type data. Numeric type variable

Numbers only are entered Statistical programs can only use numeric type data to perform analysis. (In Stata: byte, int, lng, float, double)

String type variable

Letters, numbers, characters etc. (a string of characters) Qualitative data needs to be assigned a representative number E.g. Female = 1, Male = 2 (In Stata: str)

Types of Stata files .dta .do .gph

Native Dataset file for Stata Compendium of written commands for programming/syntax “do file” Graph file

Stata Variables 

All variables must have a name



Variable names have strict rules o

You may use upper and lower case, but Stata is case-sensitive so it must be consistent

o

Maximum of 32 characters and the first character must be a letter.

o

It will only accept letter, number, and underscore. You cannot use spaces or symbols.



Variable labels allow you to enter longer descriptions of variables



Value labels are the pieces of information you give Stata to define variables the coding scheme for categorical data (E.g. Female = 1).

Exploring Stata o o o o

To open a Stata data file; File Æ open, like any other program. Find the .dta you need. When you open a file Stata will display what you have done in the Results viewer screen. ( use “file”) In the bottom right hand menu you can read the properties of the variables.

o

Clicking the Data Editor icon will open a data interface

9

Week 3, Video 3: Stata Walkthrough If accessing through iLab: Log in to iLearn Locate the Stata IC 15 desktop icon and open it Stata can use data it can read off a website. Instead of using file Æ open, you can tell Stata where to find the data set online by typing instructions into the Command dialogue

Tutorial 1. 2. 3. 4.

Type into Comand: webuse set www.psy.mq.edu.au/psystat/Stata_data/ Press enter to run Type into Command: webuse Wk3.dta Press enter to run

5.

Click Data Browse to view the new dataset When text is written in blue font it is displaying the value labels of a numeric value To check these values you can look in the properties menu (bottom right) or look in the box in the top menu

6.

To change this display back to numerical go to that Data Editor’s Tools Æ Value Labels Æ untick Hide all value labels Then retick the Hide all value labels for the next exploration

10

7.

Select a In the data to explore Properties. In the Properties box (bottom right) next to Value label is an elipsis (…) Click the … to open information about the data item’s label. Clicking the + next to a set will expand it.

8.

To sort data go to Data Æ Sort, a menu will open. Stata will not take data out of order. To customise the order click the Variables drop-down menu and select what you want it to sort by and click OK. Return to the main review screen. You will notice that Stata has recorded what you have done. We will not sort it back manually to id:

9.

Type into Command: sort id

10. Press Enter to run If no error appears it has been succseeful. Changing the order of the data does not change the data, it just displays it in different ways. Remember, Stata is case sensitive and this dataset defines id in lowercase. If you had entered it in uppercase and Error message would have appeared in red. The code below the error will open up a viewer window displaying help for that error code, informing you of what the error is.

11. Make a new variable manually: We are calling this new variable “sng”. We define the function of our new variable “sng” after = The function of this new variable will be to create/show the mean score of two of our two existing variables Type into Command: egen sng = rowmean(PSYC104 PSCY105) 12. Press Enter to run Stata will create a float data type by default

11

13. Make a new variable through menu: Data Æ Create or change data Æ Create a variable (extended)

14. Type sng2 in the Generate ...