Boston Housing Prices - One part of the final exam project. Running regression analysis on data about PDF

Title Boston Housing Prices - One part of the final exam project. Running regression analysis on data about
Course Econometrics
Institution Kennesaw State University
Pages 20
File Size 833.2 KB
File Type PDF
Total Downloads 9
Total Views 135

Summary

One part of the final exam project. Running regression analysis on data about home prices in Boston over a period of time and their statistical validity...


Description

Boston Housing Prices Econ 4710

I. Introduction We received a data set with 507 entries. These entries all dealt with variables that determine the Median Value of owner-occupied homes (medv) in Boston. Many different factors go into the value of

homes in Boston. We were given 13 different independent variables. These all have a different weight of impact on the housing prices. Our variables are listed below. Dependent 1. crim: per capita crime rate by town. 2. zn: proportion of residential land zoned for lots over 25,000 sq.ft. 3. indus: proportion of non-retail business acres per town. 4. chas: Charles River dummy variable (= 1 if tract bounds river; 0 other-wise). 5. nox: nitrogen oxides concentration (parts per 10 million). 6. rm: average number of rooms per dwelling. 7. age: proportion of owner-occupied units built prior to 1940. 8. dis: weighted mean of distances to Ove  Boston employment centres. 9. rad: index of accessibility to radial highways. 10. tax: full-value property-tax rate per n$10,000. 11. ptratio: pupil-teacher ratio by town. 12. black: 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town. 13. lstat: lower status of the population (percent). INDEPENDENT 14. medv: median value of owner-occupied homes in n$1000s. Each dependent variable differs in value of how much it affects the price of a home in Boston. We will need to run some different tests and equations to determine their weight, then we will be able to come up with an effective model that will hopefully be reliable in determining different housing prices across the city. II. Model We will build a model using SPSS to accurately and wholly explain the various factors that contribute to the different prices of housing in Boston. In the data provided, we will run tests to determine the most statistically significant variables, look for the presence of heteroscedasticity, normality and multicollinearity and come to a conclusion that provides the best explanation possible. II.1 Algorithms Before doing anything else, because we had so many independent variables, we ran three variable selection algorithms to determine which of those 15 variables were actually needed. We ran forwards, backwards and stepwise variable selection tests (available for analysis in appendix Figures A, B, and C) and used intuitive knowledge of housing determinants to trim our variable list down. The backward selection algorithm, which begins by using all of the provided variables and eliminating them from the bottom up, according to assessed significance, immediately removed the

variable age of units. This was surprising to us as we thought the age of a home would be one of the main determinants of price as older homes tend to have sustained more structural hardships and require more maintenance than newer homes. It then got rid of, the variable non retail acres which was not a surprise to us as regardless of its use, residential or commercial, land is land and will always be valuable. The most expensive home could be built in the middle of nowhere using premium materials. Next, we ran a forward elimination algorithm which examines variables one by one to determine their significance and whether or not they should be retained or discarded. According to this method, SPSS determined the least significant variables to be lower population, rooms per dwelling, pupil to teacher ratio and distance to employment centers. The only one here that really surprised us was the determination that pupil to teacher ratio was not very significant. Quality of school system typically plays a large part in determining home prices and pupil to teacher ratio is normally a large determinant of school system quality. Regardless we accepted these results and made our judgements. Lastly, as a third point of reassurance, we ran a stepwise elimination algorithm, the results of which were relatively similar to those of the forward elimination. Using the results of these three algorithms and our own knowledge of what can determine the cost of a home we decided to eliminate pupil teacher ratio, non retail acres, age of dwelling, lower population, rooms per dwelling, distance to the employment centers and proceed forward with crime rate, zone, Charles river dummy variable, nitrogen oxide concentration, accessibility to highways, property tax, and race (black). From these algorithms we decided to take out the age of units and non retail acres per the backward selection algorithm. We concluded these two variables were not significant enough to be used in the final formula. After using algorithms, we then also checked the plots to determine whether other variables were significant or insignificant. II.2 Plots Now that we determined which variables we thought were most statistically significant, we plotted them against the dependent variable, median value of owner occupied homes, in order to get a visual analysis of how the variables interacted and in order to try to spot any heteroscedasticity. For each independent variable, we created scatter plots to assess its relationship to the dependent variable (medv), all the while bearing in mind that correlation does not prove causation. Since other unknown variables bearing an effect on fit could exist, we did this not to establish a firm relationship between the independent variables and dependent variable, but to get a visual sense of how the variables we were provided with, related to each other. The appendix contains a complete list of all scatter plots created in SPSS. Figure D documents the relationship between the dependent variable, median value of occupied homes and independent variable crime rate. The graph indicates that in communities with lower median home values, there is more crime and in communities with higher median home values, there is less crime. This is something that makes intuitive sense, therefore we expected the graph to appear this way. There also appeared to be no heteroscedasticity in this graph as the data was distributed relatively randomly and there was no cone shape in the plot.

II.3 Plotting the Standardized Residuals After determining which variables were significant and not, our next step was to test the normality of the data. To do this we plotted the regression standard residuals against the normality of the data. II.4 Heteroscedasticity

III. Analysis After analyzing the scatter plots, and the results of the algorithms we were able to eliminate all unnecessary variables, determine that any factors that would diminish the reliability of our model were absent and come up with a wholly accurate and representative model. IV. Conclusion Our final formula was as follows: medv= -21.929-.131(crim)+.035(zn)+4.004(chas)-3.664(nox)+7.287(rm)+.189(rad)-.015(tax) +.014(black) The R square was significant enough to us to determine this model is reliable, however in need of more significant variables to be a better model. V. Appendix Figure A. Backwards Selection Variables Entered/Removed Variables Entered/Removeda Variables Model

Variables Entered

Removed

Method

1

.

LowerPopulation,

Enter

CharlesRiver, RaceBlack, PupilTeacherRatio, Zones, CrimeRate, RoomsPerDwelling, NonRetailAcres, AgeOfUnits, HighwayAccessibility, Distance, NitrogenOxideConcentration, PropertyTaxRateb

2

.

AgeOfUnits

Backward (criterion: Probability of F-toremove >= .100).

3

.

NonRetailAcres

Backward (criterion: Probability of F-toremove >= .100).

a. Dependent Variable: MedianHomeValue b. All requested variables entered.

Figure A.1 Backwards Selection Model Summary Model Summary Adjusted Model

R

R

R

Std. Error of the

Square

Square

Estimate

1

.861a

.741

.734

4.745298182000000

2

.861b

.741

.734

4.740496292000000

3

.861c

.741

.735

4.736233824000000

a. Predictors: (Constant), LowerPopulation, CharlesRiver, RaceBlack, PupilTeacherRatio, Zones, CrimeRate, RoomsPerDwelling, NonRetailAcres, AgeOfUnits, HighwayAccessibility, Distance, NitrogenOxideConcentration, PropertyTaxRate b. Predictors: (Constant), LowerPopulation, CharlesRiver, RaceBlack, PupilTeacherRatio, Zones, CrimeRate, RoomsPerDwelling, NonRetailAcres, HighwayAccessibility, Distance, NitrogenOxideConcentration, PropertyTaxRate

c. Predictors: (Constant), LowerPopulation, CharlesRiver, RaceBlack, PupilTeacherRatio, Zones, CrimeRate, RoomsPerDwelling, HighwayAccessibility, Distance, NitrogenOxideConcentration, PropertyTaxRate

Figure B. Forwards Selection Variables Added/Removed Variables Entered/Removeda Variables Model 1

Variables Entered LowerPopulation

Removed .

Method Forward (Criterion: Probabilityof-F-toenter...


Similar Free PDFs