Project One Template PDF

Title Project One Template
Author Nicolette Digirolamo
Course Applied Statistics
Institution Southern New Hampshire University
Pages 7
File Size 197.8 KB
File Type PDF
Total Downloads 48
Total Views 145

Summary

Project One...


Description

Median Housing Price Prediction Model for D. M. Pan National Real Estate Company

Report: Median Housing Price Prediction Model for D. M. Pan National Real Estate Company Nicolette DiGirolamo Southern New Hampshire University

1

Median Housing Price Model for D. M. Pan National Real Estate Company

2

Introduction In the following report, I will show you the predictions of median house pricing for all homes sold in the U.S in 2019. With these models, you will see the relationship between the selling price of properties and their size in square feet. This information will help for real estate agents to better determine the use of square footage as a benchmark for listing prices. Using Linear regression is most appropriate when the dependent variable has a linear relationship to the independent variable. When creating a linear regression, the following assumptions should be met, the true relationship is linear, errors have equal variance around the line, errors are normally distributed, and the observations are independent. When using the linear regression, I expect the scatter plot pattern to have points that generally fall along the regression line. In a linear regression, the dependent variable is the outcome variable, and the independent variable is the “predictor” which is the variable we are basing our predictions off of. In this case, the median house price is the dependent variable and the median square feet is the independent variable. Data Collection For this data, the median house price is the outcome (dependent variable) and the median square feet is the predictor (independent variable). In order to ensure this was a truly random sample, I went on calculator.net and I entered 6-983 (the numbers of the rows that were filled with the information I needed) and I entered that I needed 50 numbers generated. It gave me 50 random numbers, which were the numbers I used to generate my data.

Median Housing Price Model for D. M. Pan National Real Estate Company

3

Variables in Property Listings $800,000

Median House Price

$700,000 $600,000 $500,000 $400,000 $300,000 $200,000 $100,000 $0 500

1000

1500

2000

2500

Square Feet Data Analysis The difference between response and predictor variables in a linear regression is one variable, the response variable, is usually observed and measured, meanwhile the predictor variable affects the response variable, either making it increase or decrease.

3000

Median Housing Price Model for D. M. Pan National Real Estate Company median listing price

4

median square feet

Mean Median Standard

325813.824 290093.131

Mean Median Standard

1938.64485 1954.44048

Deviation

144038.236

Deviation

384.026863

According to my histogram graph, the shape of my graph is symmetric, unimodal and bell-shaped. In my scatterplot graph, I can identify a few outliers that are on the graph. According to these houses, the amount of square feet of the property has no effect on the median listing price of the property, or maybe there were other important factors that trumped the amount of square feet per property. If you compare my sample of house sales with the national population, you will see that the national population frequency of median square feet is similar to my sample, symmetric, unimodal and bell-shaped. On the other hand, if you look at my sample compared to the frequency for median listing price in the national population, you’ll see that this graph is more non-symmetrical, and skewed a little bit right. My sample is representative of national housing market sales. Some values on my data that is similar to the national data is the mean of the median square feet. Both means are within 6 numbers apart. Another similarity between the two sets of data, is the standard deviation of the median square feet is only 17 numbers apart. The differences are the median listing price’s mean, median, and standard deviation.

Median Housing Price Model for D. M. Pan National Real Estate Company

5

The Regression Model

Variables in Property Listings

Median House Price

$800,000 $700,000 $600,000 $500,000 $400,000 $300,000

f(x) = 32.56 x + 262696.98

$200,000 $100,000 $0 500

1000

1500

2000

2500

3000

Square Feet

I do think a regression model can be developed for the data because even though all of the data points are not on the linear regression line, you can generally predict the value of Y for a given value of X. When the dependent variable, Y, has a linear relationship to the independent variable, X, using simple linear regression is appropriate.

Besides the obvious factor, the bigger square footage your home has, the more expensive the property is, you also have other factors that would increase your listing price without increasing the square footage, such as living on or near the water, if you live in a gated community, or if you live in a neighborhood that had less crime and was more in a secluded area. The effect of outliers on the correlation is it most of the time decreases the value of a correlation coefficient, it weakens the regression relationship, and it can change a correlation into any direction. If I were able to remove all of the outlier data points that are displayed on my graph,

Median Housing Price Model for D. M. Pan National Real Estate Company

6

there would be a change in the slope of the regression line. If I remove outliers above the line, it will lower the slope, and if I remove outliers below the line, it will result in raising the slope. The data in my scatter graph is going in a positive direction. The strength of the context in my model is weak, but there is still a linear relationship. The correlation coefficient for this data is 0.08 In my graph, when the Y variable begins to increase as the X variable increases, then there is a positive correlation between the variables, which is what is demonstrated in my graph. The correlation between these two variables is weak, and any prediction given would not be accurate given this weak correlation. It could be advantageous to collect more data before using this prediction from this regression equation because since this correlation is so weak, you would need more research to prove the linear relationship between the two variables. The Line of Best Fit The regression equation for my data is y = 32.557x + 262697, with Y being median house price and X being square feet. In order to interpret the slope of a regression line, you use the equation rise over run. When you find a data point on the graph and plug it into that equation, you get 400, which means that as you move along the line or the slope, as the value of X increases by 1, the Y variable increases by 400. The interpretation of the Y intercept of a line the value of Y when the value of X is zero.

R squared is equal to 0.0075. In order for me to determine the direction of the association between the two variables, I have to determine if it’s a positive or negative relationship. After determining R is equal to 0.0075, I can say that the strength of the correlation is weak, since it is

Median Housing Price Model for D. M. Pan National Real Estate Company

7

less than .40, but more than 0. A coefficient of determination of .007 means that .7 % of the variation in packed cell volume can be explained by the variation in hemoglobin.

With my regression equation being Y=32.557x + 262697, I am able to make predictions on how much I should list any home for when given the amount of square feet. If the amount of square feet on my property was 1,500 ft, then after filling in my equation with 1,500 in place of X, I would get 311,532.5, meaning I would sell my house for $311,532.50. Conclusions After all of my research, data, and development of models that help to predict the median housing prices for homes sold in 2019, I can make the conclusion that the bigger in square feet the property of the home gets, the higher the price will go on that home. I did see the results I was expecting. I just assumed when buying houses, the more property there is the more money it’s worth, and after these data results, it confirms it. If you had data from 3 different towns that the houses were all on the water, the price per square foot would go up, therefore creating different results. After conducting this research, I have come to the conclusion that for every square foot on a property of a house, is roughly $400. Real estate agents moving forward are now able to better determine the price of a house depending on the square footage of the house. My one question that would be interesting for follow-up research is which different times in the year or a decade does the price per square foot to go up. For example, this year the recent pandemic of Covid-19 caused the price of houses to increase 15%, which raises the price per square foot. It’s interesting to see what historic moments or certain times during the year the prices of homes fluctuate....


Similar Free PDFs