Title | Automated Model Building Methods |
---|---|
Author | Ben Trumbo |
Course | Stats For Bus Appl Ii |
Institution | Northern Kentucky University |
Pages | 11 |
File Size | 560.7 KB |
File Type | |
Total Downloads | 43 |
Total Views | 152 |
Model Building...
Chapter 18.4 – Automated regression methods Peterson STA-213
Situation: Dozens/Hundreds of X variables to consider…. Automated regression procedures:
1. Forward selection – add X variables one at a time based on correlations.
Keep if statistically significant. Never remove variables once they are in the model.
2. Backward selection – add all X variables in at the start, remove non-significant X variables one at a time.
3. Best subsets – Run all possible combinations of the X variables. Keep the model that has significant terms and the “best stats” (R-sq and/or RMSE).
4. Stepwise regression – add X variables one at a time based on correlations.
Keep if statistically significant. If an X variables becomes non-significant along the way,
take it out.
Example: 2014 NFL Statistical analysis (Dataset: 2104 NFL Statistics - wins vs. selected stats) Which of 32 team statistics has a significant impact on wins? (Note there are n = 32 NFL teams.)
Using Stepwise Selection:
Multiple linear regression results: Dependent Variable: Wins Independent Variable(s): PassPCT, PassYDS, PassYDS/Att, PassTD, PassINT, SACKed, RushYDS, RushYDS/A, RushFUM, 1PEN, 3PCT, 4ATT, Penalties, PenaltyYds, koRetAVG, PuntRetAVG, SACK, SackYDSL, PassesDef, PassesINTbydef, ForcedFum, PuntAVG, PuntNET, PuntsIN20, PuntFC, 1
PuntAVGRet, DefRushYDS, DefRushYDS/A, DPassPCT, DPassYDS, DPassYDS/A, DPassTD
DPassTD has been deleted from the model. Reason: Tolerance of 0 is too low.
Warning: Numerical instability in the tolerance calculation. Results below may be inaccurate.
Stepwise results: P-value to enter: 0.10 P-value to leave: 0.20
Step Variable Action P-value RMSE R-squaredR-squared (adj) 1 3PCTEntered...