Homework 9 of 36402 hw9 PDF

Title Homework 9 of 36402 hw9
Author Lesley Yan
Course Advanced Data Analysis
Institution Carnegie Mellon University
Pages 2
File Size 61.9 KB
File Type PDF
Total Downloads 8
Total Views 130

Summary

Homework 9 of 36402 hw9 Homework 9 of 36402 hw9 Homework 9 of 36402 hw9...


Description

Homework 9 Advanced Methods for Data Analysis (36-402) Due Wednesday, April 14, 2021 at 6:00pm EDT

You should always show all your work and submit both a writeup and R code. • Assignments must be submitted through Gradescope as a PDF. Follow the instructions here: https://www.cmu.edu/teaching/gradescope/ • Gradescope will ask you to mark which parts of your submission correspond to each homework problem. This is mandatory; if you do not, grading will be slowed down, and your assignment will be penalized. • Make sure your work is legible in Gradescope. You may not receive credit for work the TAs cannot read. Note: If you submit a PDF with pages much larger than 8.5 × 11”, they will be blurry and unreadable in Gradescope. • For questions involving R code, we strongly recommend using R Markdown. The relevant code should be included with each question, rather than in an appendix. A template Rmd file is provided on Canvas. 1. Comparing abalone models, redux. This problem uses the same abalone data set that we used in Homework 8. In Homework 8, you should have done exploratory analysis of the response, Shucked.weight, and the three predictors: Diameter, Length, and Height. Revisit that EDA so you’re familiar with the data. We will explore linear and smoothing spline models for predicting the response from some or all of the predictors. Whenever you need a smoothing parameter for a smoothing spline in this homework, use the value smooth.spline chooses automatically. This applies to both the full data and to the bootstrap samples. In Homework 8, you fit two models: Model 1: A linear regression of log(Shucked.weight), on the logarithms of all three predictors. Model 2: A kernel regression of Shucked.weight, on all three predictors: Diameter, Length, and Height. In this problem, you should fit an additional model: Model 3: A smoothing spline for predicting Shucked.weight from Diameter times Length times Height.

1

36–402

Spring 2021

Note: Again, when making predictions with Model 1, remember that the predict function will use the predictors correctly, but it will predict the logarithm of the response. Use exp when appropriate to get predictions of Shucked.weight. (a) Let’s start with EDA. On Homework 8, you were asked to do some basic EDA for this data. Let’s expand on that to understand what kind of EDA you would do when faced with a problem like this one. Make the following plots: 1. Univariate plots (such as histograms) of each predictor before and after taking the log transformation. Comment on whether the log transform seems appropriate. 2. Bivariate plots of the predictors against the response, for Model 1 and Model 2. Comment on whether the linearity assumption in Model 1 seems reasonable, and whether the log transform improves the relationship. 3. A plot of Shucked.weight against the product of the three predictors. Comment on what you expect from Model 3 and how this relationship looks. Your goal is to use the EDA to form an opinion about the right modeling choices— transformations, linearity assumptions, choice of smoother—before you begin fitting models. This helps you make your modeling decisions, and also helps you understand the data so you can interpret the model results and notice if something goes amiss. (b) Examine residuals from Models 1 and 3 and say what the distributions of the residuals imply for the choice of bootstrap method to use for this data. (c) Now let’s switch to using Model 3, the smoothing spline model. We will use the bootstrap to compute approximate confidence intervals for the mean response r(x) for each x shown in this table: Length Diameter Height 80 50 12 475 340 110 525 465 142 655 490 156 875 750 450 Use B = 1000 bootstrap samples drawn using the method you chose in part (a). For each bootstrap sample b, fit a smoothing spline and use predict to predict the response Shucked.weight at each of the values in the table. (We can call these ˆrb∗ (x).) For each x in the table, compute a 95% pivotal bootstrap confidence interval for r(x). Produce a table showing the lengths, diameters, and heights, plus the predicted rˆ(x) and 95% confidence interval of r(x).

Page 2...


Similar Free PDFs