HW9 Solns PDF

Title	HW9 Solns
Author	jj oi
Course	Statistical Methods For Data Mining
Institution	Northwestern University
Pages	10
File Size	264.4 KB
File Type	PDF
Total Downloads	108
Total Views	155

Preview

CLICK TO PREVIEW PDF

Summary

hw9...

Description

HW 9 Solutions 1) Revisit the Prob 1 worksheet of HW7_data.xls, which contains data from a prostate cancer study in which the goal was to understand the relationship between a prostate specific antigen (PSA, which was the response y) and a number of clinical measurement variables (the predictor variables) in men with advanced prostate cancer. Data for n = 97 subjects are included. There were seven predictor variables, but this data set contains only three: Cancer volume (x1), prostate weight (x2), and capsular penetration (x3). For this problem, you will use the gbm package and function in R to fit a boosted tree to the prostate cancer data. a) Use the built-in CV (via the gbm and gbm.perf functions together) to find the best number of trees to include in the final boosted tree. For consistency, use n.trees=5000, shrinkage=0.02, interaction.depth=3, bag.fraction = .5, train.fraction = 1, n.minobsinnode = 3, cv.folds = 10. Repeat the gbm and gbm.perf functions a few times to see

how much the results change from replicate to replicate, and discuss what you see. Averaging the results across the multiple replicates, roughly what is the best number of trees for the final boosted tree, and roughly what is the corresponding CV r2? PRO...