The 5 Ws - wagner PDF

Title The 5 Ws - wagner
Course Introduction to Applied Statistics I
Institution University of Alberta
Pages 4
File Size 190.3 KB
File Type PDF
Total Downloads 2
Total Views 150

Summary

wagner...


Description

The 5 W’s Definition:     

The entire collection of individuals is called the population of interest. A sample is a subset of the population, selected for study in some prescribed manner. A parameter is a number that describes a characteristic of the population (often unknown) A statistic is a number that describes a characteristic of the sample (are known once the data are observed) We typically use Greek letters to denote parameters and Latin letters to denote statistics.

Population

Parameter

Sampling

Inference

Data

Statistic

Example: I want to estimate the proportion of male students for this Stat 151 Lecture by randomly selecting 10 students in this class. Population of Interest:

Sample:

Parameter:

Statistic:

Example: An investigator wants to estimate the average height of Canadian females by measuring the height of 1000 randomly picked Canadian females. Population of Interest:

Sample:

Parameter:

Statistic:

1

Definition: Data can be numbers, record names, or other labels. To provide context, the data needs five “W’s” and one “H”: Who, What, When, Where, Why, and How. Who: Who are you interested in? The Who of the data tells us the individual cases about which (or whom) we have collected data. Subjects or participants – people on whom we experiment. The entire set of subjects is the population. The set of subjects you observe is your sample. Respondents – individuals who answer a survey. Experimental units – animals, plants, objects.

      

What:    

What characteristic (or variable) are you measuring? Variables are characteristics recorded about each individual. A variable can take different values for different individuals. There are two types of variables Categorical (Qualitative) and Numerical (Quantitative)

1. A categorical variable names categories and answers questions about who falls into those categories. For example, nationality or licences plate number. These variables can be represented with words and/or numbers. There are two types: I. Nominal: a categorical variable that has no logical order. Example: hair color (blonde, white, black, red, etc…) II.

Ordinal: a categorical variable that has a logical order. Example: letter grade (A+, A, A-, B+, B, B-, C+, C, C-, D+, D, F)

2. A quantitative variable answers questions about the quantity of what is being measured. For example, distance or height. These variables must be represented with numbers and those numbers must have units (meters, degrees Celsius, seconds ect.) There are two types: I. Discrete: can only take on distinct values Example: Age (0,1,2,3… in years) II.

Continuous: can take on any value in a given interval Example: Weight (75.5435345… KG)

2

Example: A medical study. Data from a medical study contain values of many variables of each of the people who were the subjects of the study. Which of the following variables are categorical and which are quantitative? Which of the two types described above are they? a) Age (years)

b) Smoker (yes or no)

c) Systolic blood pressure (millimeters of mercury)

d) Level of calcium in the blood (micrograms per milliliter)

e) Drug effectiveness (1=strongly agree, 2=agree, 3=neutral, 4=disagree, 5=strongly disagree)

Definition: An identifier variable is a categorical variable that records a unique value for each case, used to name or identify it. For example, your SIN, a books ISBN, or a licence plate #. Example: Each student has their own characteristics: age, student ID, height, gender, and grade. Consider the following table:

The “What” is the column titles and the “Who” is the row titles. What type of variable is student ID? Student ID is known as an identifier variable. It is not a quantitative variable as it doesn’t have units. It is a categorical variable with one individual in each category. The university assigns you a unique student ID, so that they can identify you on their system.

3

Why:  

Why are you doing this? What is the reason for collecting data?

When and Where:    

Gives us some information about the context. A different time or location can give different meaning to the data Example: values recorded at the U of A in 1960s may mean something different than similar values recorded last year. (eg. average price of a pen that students use) Example: salary in Canada vs salary in 3rd world country

How:      

How is the data being collected? How is the variable being measured? Data must be gathered properly (without bias). Improper data collection methodology may lead to wrong conclusions. Critically important for appropriate analysis and validity of inferences. Example: results from voluntary internet surveys are often useless. (voluntary bias)

Example: One of the reasons that the Monitoring the Future (MTF) project was started was “to study changes in the beliefs, attitudes, and behavior of young people in the United States.” Data is collected from 8th, 10th, and 12th graders each year. To get a representative nationwide sample, surveys are given to a randomly selected group of students. In Spring 2004, students were asked about alcohol, illegal drug, and cigarette use. Describe the W’s, if the information is given. If the information is not given, state that it is not specified. - Who: 8th, 10th, and 12th graders - What: alcohol, illegal drug, and cigarette use - Why: “to study changes in the beliefs, attitudes, and behavior of young people in the United States” - When: Spring 2004 - Where: United States - How: survey

4...


Similar Free PDFs