Title | Week 2 Linear Regression Analysis - tutorial worksheet |
---|---|
Course | Introduction To Statistical Reasoning |
Institution | Monash University |
Pages | 4 |
File Size | 258.8 KB |
File Type | |
Total Downloads | 47 |
Total Views | 204 |
Week 2 worksheet of weekly tutorial sheet. Learnt about association, correlation, residual and regression analysis. Class from 2020....
SCI1020: Introduction to Statistical Reasoning
WEEK 2: LINEAR REGRESSION EXPLORING DATA- Relationship between two Quantitative Variables Student's Name:
Tutorial Day/Time:
PRELIMINARY READING: D S Moore et al, “Basic Practice of Statistics”, Chs 4-5. On completion of this workshop you should be able to: 1. Produce a scatterplot of quantitative data with appropriate explanatory and response axes; 2. Recognise a linear pattern and the general formula for a straight line; 3. Calculate a predicted value given the equation of the linear regression line; 4. Add a linear line of best fit to data using MS EXCEL, and describe the regression line (equation and correlation); 5. Assess the closeness of fit using the least-squares criterion as reflected in the correlation coefficient; 6. Obtain residual values and interpret their size and distribution about the line in the form of a residual plot; 7. Find or calculate and interpret the squared correlation, r2.
PRELIMINARY QUESTIONS: These problems are to help you engage with the lecture material, and also to make sure that everyone is upto-speed before the workshop starts. Please make sure you do them before class each week!
Q.1 State in your own words what is meant by each of the terms listed below. Be specific. Term
Definition
Explanatory variable
The explanatory variable is also known as the independent variable. It is ususally manipulated to influence the change in a response variable.
Response Variable
The response variable is also known as the dependent variable. Its change is explained/affected by the explanatory variable; it measures the outcome of a model.
Association
Direction of the trend (positive, negative, none); a relationship between two random variables that are statistically dependent on each other.
Correlation
Measure of how strong the association is between the two variables. This is quantified by the correlation coefficient "r".
Regression line
A linear line that describes how the dependent variable (y) changes as the independent variable (x) changes; line of best fit
Residual
Difference between the actual value of y and the y value predicted by the regression line for each x value of the data. It is calculated by the formula [residual = y - y]
Q.2 What is the general equation of a straight line? Define all the terms in the equation. The general equation of a straight line is y = mx + c, where y is the response variable, m is the slope, x is the explanatory variable and c is the y-intercept (where x = 0).
Week 2
Copyright 2019: Monash University
Page | 1
Q.3 Do Q5.2 from Moore et al text, p.130. What is the regression line equation based on the description of the trend in this example? R = No. of individuals taking up regular running exercise, C = No. of cigarettes smoked daily Intercept = 48 million Slope = -0.178 (for each R, C decreases by 0.178) Hence the regression line equation is C = - 0.178R + 48 million
WORKSHOP PROBLEMS: Q.4 Demonstration of correlation and least squares regression. a) Go to the website http://digitalfirst.bfwpub.com/stats_applet/stats_applet_5_correg.html (Note that spaces in the URL are underscores_ ). Create a scatterplot of linear trend (similar to plot #1 below. Observe the size of the correlation coefficient for different scatter patterns. Use “Draw your own line” to draw a line of best fit. Change the intercept and slope, trying to minimise the sum of the squares of the residuals as shown by the “relative SS” value. Compare yours with the “Show least-squares line” which is placed by calculation. No written answers are required here just observe the values. b) Describe the relationship in the x-y data plotted below: 3.
5. Change in pulse rate with exercise
Quiz score vs chocolate consumption
1.
2.
120
Measured radioactive decay
140
1400
120
1200
100
1000
80
800
60
600
40
400
20
200
100 80 60 40 20
0
0
0 0
50
100
150
200
250
300
0
20
40
60
80
100
Pulse rate before exercise (beats per minute)
Daily Chocolate consumption (g)
120
0
5
10
15
Time (mins)
Identify the association (positive/negative/none) and correlation (strong/moderate/weak/none) present. PLOT
1
2
3
Association
None
Positive
Negative
Correlation
None
Strong
Moderate
Estimate r (If approp.)
0
0.9...