SNHU MAT 240 Project 1 PDF

Title SNHU MAT 240 Project 1
Author Collin Wiltshire
Course Applied Statistics
Institution Southern New Hampshire University
Pages 8
File Size 251.7 KB
File Type PDF
Total Downloads 68
Total Views 137

Summary

Report: Median Housing Price Prediction Model for D. M. Pan National Real Estate Company per the assignment rubic....


Description

Median Housing Price Prediction Model for D. M. Pan National Real Estate Company

Report: Median Housing Price Prediction Model for D. M. Pan National Real Estate Company Collin L. Wiltshire Southern New Hampshire University

1

Median Housing Price Model for D. M. Pan National Real Estate Company

2

Introduction The purpose of this report will be to develop a model useful to D.M. Pan National Real Estate Company to predict median housing prices from data of homes sold in 2019. The analysis conducted comprises of Real Estate data broken down by Region and County, with comparison to the National Statistics of the housing market. The goal of this presentation will be to help D.M. Pan National Real Estate Company and its Real Estate Agents better determine the use of square footage as a benchmark for listing prices on homes placed on the market. A collaboration of data derived from scatterplots, histograms and tables will be used to explain our business approach for determining list prices. With the use of scatterplots, expected display of data will be a spread out but most will be collectively together with a few outliers due to high market housing based on location. However a linear regression equation can be determine and used as a litmus for listing price. In this case linear regression is most appropriate as we can apply median square feet on our x-axis as the predictor variable which will in turn provide a response variable on the y-axis which will be the median listing price. The square footage of any home can be easily attain without performing an appraisal and can be used to give n initial response of a predicted list price. Data Collection The National Real Estate County Data comprises of information for median listing price, median cost per square foot and medina square feet. The database consist of several hundred data points of information, however for this analysis the random select function within Excel was used to sort the data in a random order and the first 50 data points were used to create this analysis. As introduced earlier the data for median square feet is placed on the x-axis for the

Median Housing Price Model for D. M. Pan National Real Estate Company

3

predictor variable and the median listing price is placed on the y-axis which in turn provide a response variable. CHART 1

Median Listing Price $1,200,000

Med $1,000,000 ian $800,000 List Price $600,000 $400,000 $200,000 $0 1200

f(x) = 74.61 x + 140224.63 R² = 0.02

1400

1600

1800

2000

2200

2400

2600

2800

Median Square Feet

Data Analysis Chart 1 depicts the relationship between two variables (median list price and median square feet) on a chart, where each axis corresponds to one variable. By doing so it can be determined visually whether two variables are correlated such that when a variable behaves with a predictable trend with respect to changes in another variable. In this case the median square feet is a known quantity which can be said to be an independent variable or predictor variable. Whereas the median list price is derived from the homes square footage and is then said to be the dependent variable or response variable.

Median Housing Price Model for D. M. Pan National Real Estate Company

CHART 2

Fr eq ue

Median Square Feet

CHART 3

Fre que ncy

Median Listing Price

4

Median Housing Price Model for D. M. Pan National Real Estate Company

5

TABLE 1 Median Square Feet Mean 1922 Median 1871 Standard Deviation 307.5156599 Median Listing Pice Mean $283,589 Median $243,234 Standard Deviation $ 164,178.58

From Chart 1 it can be determined the regression is linear from the sample data provided. Table 1 gives the center and spread for both the Median Square Feet and Median Listing Price and correlates well with the depicted linear regression. Chart 1 displays two large outliers of data, Santa Cruz County in California with 1716 median square feet for a median list price of a $958,821 and Alameda County also in California with 1538 median square feet for a median list price of $830,040. These outliers can be explained from the factor of beach front property in upscale neighborhoods in California. TABLE 2 Median Square Feet Mean 1944 Median 1901 Standard Deviation 367 Median Listing Pice Mean $288,407 Median $256,936 Standard Deviation $ 163,986.00

Table 1 displays the information for the sample data used to develop this analysis and Table 2 displays the data retrieved from the National Statistics and Graph. Comparing Table 1

Median Housing Price Model for D. M. Pan National Real Estate Company

6

and Table 2 it could be said that there is a very close comparison to the sample data with the national market sales. Further comparison of the histograms from the sample data and the national statistics further confirms the sample is representative of the national housing sales. The Regression Model Chart 1 displays the linear regression which depicts the line of best fit. In this case it shows the range of median square feet in which this model is most accurate in determining the median list price, in this case this is in a range of approximately 1400 – 2600 square feet. Based on Chart 1 it can be determined a regression model can be developed and used because of the linear relationship between median square feet and median list price. It can also be seen where the errors of data points have near equal variance around the linear regression line and the errors are near normally distributed. Further analyzing Chart 1 a positive correlation can be seen because as the median square footage increases, the median list price increases. Reviewing the data points along the linear regression line it can be seen that there is a fairly wide spread of data points away from the line, this give the first impression that there is a weak strength. The weak strength which is suspected visually can be verified by calculating the correlation coefficient (r). This can be accomplished by using Excel or since Chart 1 gives R2 by finding the square root. Both methods were used by the Analytic Team for verification and in this case it was determined r = 0.1397. Statistically since the calculated r falls within the criteria of 0 < r < 0.40 the strength of correlation is weak. This mathematically confirms the previously suspected interpretation. It must be highlighted the effects of the two outlier data points identified earlier to the model. If these two points are removed from the sample data the linear regression line will

Median Housing Price Model for D. M. Pan National Real Estate Company

7

change. It will remain positive however will be a steeper angle which will result in a large change in the increase of listing price against square footage. The strength of correlation in turn will go from weak to moderate. At this time it is recommended to keep the two outlier data points in the analysis. The Line of Best Fit Within Chart 1 the regression equation is displayed as y = 74.61x + 140225 where y = list price and x = square feet. Using the regression equation the following can be identified: the slope (b1) is $74.61 and the intercept (b0) is $140,225. From the slope the translation would be for every 1 square foot it would cost $74.61. For determining the intercept you could only have a meaningful interpretation when the x value is close to zero in this case the value closes to zero is approximately 1400 square feet. This also does not allow for the determination for the value of the land by itself, there is no value at or near zero and cannot be practically used for that purpose. The coefficient of determination, R2 = 0.0195 which translates to only 1.95% of the variation in the house listing price can be explained by the variation in the house square footage. As a data point worth being aware of, if the two outliers are removed R2 = 0.2115 giving an increase to 21.15% where variation in the housing listing price can be explained by the variation in the house square footage. An example where the regression equation can be used to make a prediction in listing price a value of 2400 square feet will be used in the equation. Note the square feet used in this example is within the range of the line of best fit. y=(74.61*2400)+$140,225 y=$179,064+$140,225 = $319,289 Conclusions

Median Housing Price Model for D. M. Pan National Real Estate Company

8

In closing it can be determined the sample data is consistent with the National Statistics for the Real Estate Market and was expected by the Analytical Team. The model and equation was found to be useful to our company and can be used as an initial tool to determine list price of real estate before doing any further evaluation or appraisal of a home. Although useful the equation is limited due to the line of best fit for the range of square footage, limiting its use to just under 2% of homes. However if the outliers were removed its usefulness will substantially increase to just over 21% of homes. The question Executives will have to ask and determine, would it be beneficial for D.M. Pan National Real Estate Company to develop more than one regression model for use? That being a model which only comprises of known high end list price real estate in influential markets or counties and another model which is closely aligned with the average American list price real estate....


Similar Free PDFs