Title | GGR276 Lecture 5 - Yuhong He |
---|---|
Author | Howard Chen |
Course | Spatial Statistics |
Institution | University of Toronto |
Pages | 57 |
File Size | 4.7 MB |
File Type | |
Total Downloads | 67 |
Total Views | 138 |
Yuhong He...
GGR GGR276 276 Spatial Data Analysis and Mapping
Lecture 5 | Data Representation & Inferential Statistics Prof. Yuhong He
Midt Midterm erm Study Guide
Is uploaded to blackboard
Midterm Date: Tuesday February 7, 2017, 1:00-3:00pm in DV2082
This is a 2 hour exam, worth 20% of your final course gradeYou are allowed to use non-programmable calculators and rulers.
You are not allowed to share calculators, rulers, or anything else with other students writing the exam.
Lecture 5 – Mid Midterm term Review Sign-in SOCRATIVE Input Room Code: HEGGR276
INPUT: LAST name, FIRST name As it appears on ROSI/ACORN/BB
Overview
Data Representation Probability distributions Readings for this class: Chapter 5 (5.1, 5.2, 5.4) Chapter 6 (all)
Data for 48 students in 2016 summer GGR276
Data Representatio Representation n|G GRAPHING RAPHING
Need a way to organize data – raw data in a spreadsheet is not useful
To draw conclusions from data, they must be organized in a meaningful way Models Graphs Charts
Easiest, most convenient method: frequency distribution
HISTOGRAM Divide measurement into equal-sized categories (i.e. Bin Width). Draw a bar for each category so bars’ heights represent the number or percentage in each category. Not good for small datasets.
2016 Summer GGR276 | Height 18
Bin Width
16
16
Frequency
14
140-149 14
12
150-159
12
10
160-169
8
170-179, etc
6
R CODE:
4 2
2
1
2
190
200
> hist (classdata$happy, )
0 140
150
160
170 180 Height (cm)
More
EXCEL: Data > Data Analysis > Histogram
Presenting Qualitati Qualitative ve Data “How often” can be measured in 3 ways Use to describe: 1. Frequency (straight counts) What categories have been measured How often each category has occurred 2. Relative frequency (freq/# obs) 3. Percent = 100*Relative frequency
Data for 48 students in 2016 summer GGR276
PIE CHART R CODE: > pie(mm.vector) EXCEL: Insert > Charts > Select Pie Chart from drop-down menu
Data for 48 students in 2016 summer GGR276
R CODE: barplot() EXCEL: Insert > Charts > Bar or Column Sport Basketball Hockey Baseball Soccer Football None
Frequenc y 19 2 4 9 4 9
Frequency
BAR CHART 20 18 16 14 12 10 8 6 4 2 0
GGR276 | Sport Preference
Data for 48 students in 2016 summer GGR276
PARETO CHART
Contains both bars and a line graph Bars display the values in descending/ascend ing order Line graph shows the cumulative totals
DOT PLOT
One dot represents each data point. Not good for large datasets
GGR276 | Pets Pet Dog Cat Reptile Fish Other No Pet
Frequency •••••••• •••••• •• •• • ••••••••••••••••••••••••••••
Fastest Ever Driving Speed 226 Stat 100 Students, Fall '98
100 Men
126 Women 70
80
90
100
110 120 130 140 Speed
150
160
Interpreting Graphs | Centre & Spread Symmetric: Even distribution Bimodal: two peaks or clusters of data
Positive Skew (Skewed right): Tail on right, majority of data on left. Negative Skew (Skewed left): Tail on left, majority of data on right.
Where is the data centered on the horizontal axis, and how does it spread out from the center?
Interpreting Graphs | Outliers
No Outliers
Outlier
• Are there any strange or unusual measurements that stand out in the data set?
Presenting Quantit Quantitative ative Data
….and many more!
Data for 48 students in 2016 summer GGR276
HISTOGRAM 2016 Summer GGR276 | Height 18
Bin Width
16
16
Frequency
14
140-149 14
12
150-159
12
10
160-169
8
Divide measurement into equal-sized categories (i.e. Bin Width). Draw a bar for each category so bars’ heights represent the number or percentage in each category. Not good for small datasets.
170-179, etc
6
R CODE:
4 2
2
1
2
190
200
> hist (classdata$happy, )
0 140
150
160
170 180 Height (cm)
More
EXCEL: Data > Data Analysis > Histogram
Og Ogii v e ((““ oh -j i v e ” ) C h a r t
A graph that represents the cumulative frequencies for the classes in a frequency distribution
Cumulative %
100 100
80
88 79
60
69
40
48 38
20 0 1
2
23
19
6
0
3
4
5
6
7
Job Confidence Score
8
9
10
What is the percentage of students having confidence greater than 7?
What telephone bill value is at the 50th percentile?
SCATTERPLOT GGR276 Happy vs Job Confidence Job Confidence
12
This type of plot becomes more useful when plotting two variables (bivariate plots).
10 8 6 4 2 0 0
2
4
6 Happy
8
10
Bivariate SCATTERPLOT
Summarizes the relationship between two quantitative variables Horizontal axis (X-axis) represents one variable, vertical axis (Y-axis) represents the other
R CODE: >plot(x-variable, y-variable) EXCEL: Insert > Charts > Scatter
3D GRAPHS
In some cases, 3D graphs can tell a better story.
G e o v i s u a l i z a ti o n
Presenting spatial data
Uses concrete visual representations and human visual abilities to enhance the communication of the spatial properties of phenomena or processes
Facilitates identification and interpretation of spatial patterns and relationships in complex data
GEOVISUALIZATION | Obesity Trends in Canada 19852003 1985-2003
Definitions: Obesity – having a high amount of body fat in relation to lean body mass. Measured using Body Mass Index (BMI) - a measure of an adult’s weight in relation to his or her height Weight (kg)/ (Height(m))2 BMI >25 = Overweight; BMI >30 = Obese Data shown in the following slides comes from 3 sources: HPS – Health Promotion Survey NPHS – National Population Health Survey CCHS – Canadian Community Health Surveys Statistics Canada, Source: P.T. Katzmarzyk, Unpublished Results. Data from: Statistics Canada. Health Indicators, June, 2004 (Acknowledgement: Dora Pouliou, PhD)
Obesity Trends Among Canadian Adults (*BMI 30, or ~ 30 lbs overweight for 5’4” person)
HPS, 1985
No Data...