Title | HW9 Solns |
---|---|
Author | jj oi |
Course | Statistical Methods For Data Mining |
Institution | Northwestern University |
Pages | 10 |
File Size | 264.4 KB |
File Type | |
Total Downloads | 108 |
Total Views | 155 |
hw9...
HW 9 Solutions 1) Revisit the Prob 1 worksheet of HW7_data.xls, which contains data from a prostate cancer study in which the goal was to understand the relationship between a prostate specific antigen (PSA, which was the response y) and a number of clinical measurement variables (the predictor variables) in men with advanced prostate cancer. Data for n = 97 subjects are included. There were seven predictor variables, but this data set contains only three: Cancer volume (x1), prostate weight (x2), and capsular penetration (x3). For this problem, you will use the gbm package and function in R to fit a boosted tree to the prostate cancer data. a) Use the built-in CV (via the gbm and gbm.perf functions together) to find the best number of trees to include in the final boosted tree. For consistency, use n.trees=5000, shrinkage=0.02, interaction.depth=3, bag.fraction = .5, train.fraction = 1, n.minobsinnode = 3, cv.folds = 10. Repeat the gbm and gbm.perf functions a few times to see
how much the results change from replicate to replicate, and discuss what you see. Averaging the results across the multiple replicates, roughly what is the best number of trees for the final boosted tree, and roughly what is the corresponding CV r2? PRO...