Lecture 9 - Regression with Arima error PDF

Title	Lecture 9 - Regression with Arima error
Author	Alfonso Yagüe Yges
Course	Business Forecasting
Institution	Loughborough University
Pages	17
File Size	1.1 MB
File Type	PDF
Total Downloads	23
Total Views	129

Preview

CLICK TO PREVIEW PDF

Summary

Full lecture notes and wider reading...

Description

Business Forecasting Lecture 9 – Regression with ARIMA errors, Monitoring the performance of a built model Regression with ARIMA errors Decomposition + ARIMA and now we will see Regression + ARIMA. Motivations Dummy variable models with multiple regression ● ●

Data = Trend + Seasonal + Irregular Dummy variable models with a linear trend Data = b0 + b1 Q1 + b2 Q2 + b3 Q3 + b4 time + error ▪ Trend-cycle and seasonal components ▪ Irregular component? Assumption: the error term does not contain any useful information so that it is ignored. This assumption needs to be checked via error analysis

This approach is unable to appropriately deal with the irregular components. Motivating example Example A catalog company is interested in modelling the sales of its women’s clothing. The company collected quarterly sales of women’s clothing from Q1 1989 to Q4 1998. ● ●

Trend? Seasonality?

You can see a trend, a seasonal effect. Therefore, you consider the dummy variables model. Example ●

Dummy variable model The forecast variable women's clothing is regressed on dummies time, Q1, Q2 and Q3. SPSS output is displayed below. The adjusted R-square is 88.7%.

women’s clothing = 122.368 + 1.439 time - 59.451*Q1 - 31.566*Q2-29.462*Q3 + error Example: residual analysis ●

Any unexplained structure left in the residuals?

Does the assumption on the irregular component seem correct?

Since we see that ACF and PACF, we have strong evidence supporting the model, indicating that it is reasonable for us to assume the error is ‘white noise’. ●

Question What do we do if there is unexplained structure left in the residuals? The built model needs to be refined!

In the cases above, there is strong evidence indicating there is some uncaptured information left in the errors, which indicates that the model can be further improved. General multiple regression Multiple regression models revisited ●

Response variable y is regressed on variables x1, x2, …, xm: y =b0 +b1 x1 +b2 x2+ b3 x3+……+bm xm + error with coefficients b0, …bm.

●

Example: we have our dependent variable “sales” and two variables to calculate our dependent variable ▪ y = sales ▪ x1 = advertising expenditure; x2 = price difference

●

Assuming the error is white noise: assumption error does not contain any useful information; however, in practice this assumption could be wrong

Irregular component Handling the irregular component in regression ● ● ●

When the error terms for a time series regression model are autocorrelated, the model is inadequate We should remedy the problem by modelling the autocorrelation By taking autocorrelation into account, we can obtain more precise prediction

Sometimes when you apply your regression model to analyse time series data, you incur the problem that the forecast error contains autocorrelation structure which our model fails to capture. One way to sort out this problem is to introduce ARIMA model. Multiple regression + ARIMA error We assume the error contains the autocorrelation structure (not white noise) and we therefore use ARIMA to capture that information. *Whenever we use ARIMA model we need to think about how to choose p and q. Multiple regression model with ARIMA error ●

Response variable y is regressed on variables x1, x2, …, xm at each time point t: yt=b0 +b1 x1t +b2 x2t+ b3 x3t+……+bmxm t+ error (t)

●

Use an ARIMA (p, d, q) to model to the error ▪ Let nt denote the error at time point t ▪ The simplest model is AR (1): n t = a1 nt-1+ et where et is white noise. No constant is required in the AR (1) model for the error ▪ Question: in general, how do we determine the orders of the ARIMA (p, d, q) for the error?

Model for the error term ●

If the error appears stationary, try an AR (1) or AR (2) for the error ▪ Regression with AR (1) yt=b0 +b1 x1t +b2 x2t+ b3 x3t+……+bmxm t+ nt where nt = a1 nt-1+ et and et is white noise at time point t

▪

●

Regression with AR (2) yt=b0 +b1 x1t +b2 x2t+ b3 x3t+……+ bmxm t + nt where nt = a1 nt-1 +a2 nt-2 + et and et is white noise at time point t

The resulting residuals of the regression must be checked

Example 1 Japanese motor vehicle production (1964 -1989) ● ●

Yearly data; when you are analysing annual/yearly data there is no issue of seasonal effect A linear trend model seems plausible, there is a clear strong growth yt = b0 +b1 xt + nt xt is the dummy variable ‘t’ taking values of 1, 2, …,26 (dummy variable representing the year)

●

Assuming the error is white noise (i.e. nt = et) and using the multiple regression to estimate the unknown coefficients yt = b0 +b1 t + et

The built model is yt = 1670.655 +471.806 t + et ●

Are the residuals white noise?

From ACF and PACF above we have evidence against the assumption that errors are white noise. In order to solve this problem, you need to combine ARIMA model with your regression. You want to build a successful ARIMA model to capture the information left in the residuals. ●

Assuming AR (1) for the error yt = b0 +b1 t + nt where nt = a1 nt-1+ et and et is white noise

The built model is yt = 1660.563 + 463.690 t + nt where nt = 0.736 nt-1 + et ●

Residuals of the refined model: any unexplained structure? We still assume the errors are white noise and therefore we must check this assumption again using ACF and PACF.

●

Further improvement? Further improvement is possible, but it will lead to a complicated model. The incremental improvement of adding additional elements into the model is small since lag 6 is just marginally significant. We interpret lag 6 as significant by chance since it does not provide with any physical meaning. Since we are analysing annual data, lag 6 should have any physical meaning.

Forecasting using the regression model error

Forecasting using the regression with AR (1) for

Apparently, the second model which is the regression with ARIMA model offers some benefits since the forecasted data follows the pattern more closely. This model has improved the accuracy of the forecasts. Exercise Energy bill data of a company ● ●

The quarterly energy bill data is displayed. This includes gas, oil, and electric comsumptions Examine the data for patterns ❖ Trend? ❖ Seasonal effect? ❖ Irregular component?

Dummy variable model ▪ ▪

●

Dummy variables you need to create: Q1, Q2, Q3, time, and time2 Multiple regression equation you wish to build: energy = b0 + b1 time + b2 time2 + c1 Q1 + c2 Q2 + c3 Q3 + error

SPSS output Any variable to remove?

From our analysis, we see that Q2 is not significant and we therefore get rid of it and take it out of the equation. We therefore run the regression once again but this time you do not include Q2. ●

Write down the regression equation for the refined model

The results after taking out Q2 tell us that the model is now okay. Based on these coefficients, you can now build the correct model.

The actual energy bill data and the forecasted values From the graph above we can see that our model has performed quite well. Certainly, we can see that the model is able to follow the pattern and the variations fairly closely. However, there is a period where the gap is quite big which shows that the model is not a perfect fit. ●

Error analysis ▪ Examine if exists unexplained structure in the residuals ▪ Suggest a plausible model for the error

I would personally interpret the ACF as exponential decay and PACF as a spike at lag 1, which would mean that a more accurate model would be AR (1). We therefore refine the equation. ●

Write down the equations for the refined regression model with an ARIMA error

Any unexplained structure left in the residuals? If there is, what would you do?

Since we can see that there are two significant lags within 16 lags (rule that we can interpret as significant by chance when there are maximum 1 significant lag within the first 20 lags), we cannot interpret both of them as significant by chance. We therefore believe that there is some useful information left in the error. As we discussed before, we can either think about a better model (would be very complicated), or, alternatively, you accept that your model is not perfect, but you leave it as it is. I would accept the fact that the model is not perfect, but we do not action anything further than this, in forecasting it can be the case that your model will not be perfect, and we need to accept this. ●

Comparison of forecasting obtained from the two models

Actual data and forecasts using the regression

Actual data and forecasts using the regression with AR (1) error

In conclusion, in AR (1) the gap in certain periods has been substantially reduced, which means that using AR (1) is beneficial to capture the remaining autocorrelation structure in our data. Point forecasts ●

Point forecasts for regression with AR (1) error yt=b0 +b1 x1t +b2 x2t+……+ bmxm t + nt where nt = a1 nt-1+ et and et is white noise ▪

One-step ahead forecast ❖ Combine the two equations yt=b0 +b1 x1t +b2 x2t+……+ bmxm t + a1 nt-1+ et ❖ Replace t by t+1 in the equation yt+1=b0 +b1 x1,t+1 +b2 x2,t+1+……+bm xm,t+1+ a1 nt+ et+1 ❖ Calculate the forecast Ft+1=b0 +b1 x1,t+1 +b2 x2,t+1+……+bm xm,t+1+ a1 nt

where nt= yt – [b0 +b1 x1t +b2 x2t+……+bm xmt] Example 1 (continued) Japanese motor vehicle production (1964 -1989) (continued) ● ●

Current time period: year 1989 and t=26 One-step forecast of production in year 1990 Ft +1= 1660.563 +463.690 (t+1) + 0.736 nt where nt= yt – [1660.563 +463.690 t] ▪ t=26 and yt =13026. Hence n26= -690.5030 ▪ F27= 1660.563 +463.690 × 27 + 0.736 × (-690.5030) = 13671.98

Point forecasts for regression with AR (1) error yt=b0 +b1 x1t +b2 x2t+……+ bm xm t + a1 nt-1+ et ●

k-step ahead forecast ▪ Replace t by t+k in the equation yt+k=b0 +b1 x1, t+k +b2 x2, t+k+……+bm xm, t+k+ a1 nt+k-1+ et+k ▪

Calculate the forecast Ft+k=b0 +b1 x1, t+k +b2 x2, t+k+……+bm xm, t+k+ a1 nt+k-1 where nt+k-1= Ft+k-1 - [b0 +b1 x1, t+k-1 +b2 x2, t+k-1+……+bm xm, t+k-1]

This is the same process as we discussed in week 7. A more general scenario In some applications, the error of your regression is not stationary. If that is the case, we take first difference. We need to take first difference not only for our forecast variables but also for all the predictor variables. You take first difference of all the variables involved to then build your model. Non-stationary models for the errors If a non-stationary model is required for the errors, then the model can be estimated by first differencing all the variables and then fitting a regression with an ARMA model for the errors: ●

Difference the forecast variable: ut =yt −yt-1

●

Difference all the predictor variables: v1t =x1t −x1, t-1 v2t =x2t −x2,t-1 vmt =xmt −xm,t-1

●

Modelling ut=b0 +b1v1t +b2 v2t+……+ bmvm t + nt where nt follows an ARMA(p, q) model

In summary, you use regression and if it turns out that the error contains some autocorrelation structure, you use ARIMA to capture this autocorrelation structure. You use the ARIMA method to overcome the limitations of the regression analysis.

Monitoring the performance of a built model Why monitoring

Performance monitoring is essential since a business forecasting model is built on certain assumptions on seasonality, cyclical and trend patterns, etc and when any of these components change over time due to their relation with the overall economy, for example, the model needs to be revisited and changed since it will have lost it predictive capability. Monitoring is therefore very important to make sure that our assumptions are applicable over time. Three issues in business forecasting ● ● ●

model building model testing application

1) split the data between modelling data and holdback data; 2) build your model based on modelling data; 3) you test your model against modelling and holdback data. Assumptions in model building ●

Trend ▪ ▪

Increase or decrease? linear or nonlinear?)

●

Seasonality ▪ constant seasonal factors?

●

Irregular variation ▪ AR or MA?

●

Additive or multiplicative?

Assumptions in applications ●

Assuming the patterns will persist in future

Signal tracking Questions ● ● ●

How do we know the performance of the built model in future What if the underlying economic situation has changed? When do we need to modify our model?

Signal tracking A measure that indicates whether the forecast is keeping pace with any genuine upward or downward changes in the forecast variable (demand, sales, etc.). Tracking Signal How to calculate a tracking signal? Tracking signal is mathematically defined as the sum of the forecast errors divided by the mean absolute deviation: Tracking signal (t)= running sum of forecast errors (t) / MAD(t) How does it work ●

Forecast error

Calculate the difference between actual and forecasted values

●

Express the forecast errors as absolute values

Exercise Your employer, Jones & Associates, has built a model to forecast monthly demand in 2006. After nine months have passed and actual demand data have been collected, your boss asks you to develop a tracking signal to measure the accuracy of the forecasts. The data for actual demand and forecasted sales is in the following table.

How does it work ●

Running sum Calculate the running (cumulative) sum of the forecast errors Make sure to add forecast errors, not add absolute values Running sum (t) = Running sum (t-1) + forecast error (t)

●

Calculate MAD Divide the summed absolute values by the number of periods to calculate MAD.

●

Alternative: using the recursive formula MAD(t) = (t-1) *MAD(t-1) /t + absolute error (t) / t

●

Calculate the tracking signal: Divide the running sum by the corresponding MAD value Tracking signal (t) = running sum (t) / MAD (t)

●

What does a tracking signal mean? ▪ Ratio of cumulative error to average deviation Because a tracking signal is a ratio of the cumulative error produced by individual forecasts to the average deviation of those forecasts from their actual values, it typically “tracks” forecasts to see if they stay within a similar pattern to the corresponding demand.

●

How do we know a model needs to be re-built on the basis of the tracking signal? ▪ Using a control chart

Control chart

You decide sample size. To construct a control chart for the target level (m) of a process, the standard deviation of the process (s) must be known. For samples with size n: ● ● ● ●

a centre line at the target mean of the process m warning lines at m - 2s/Ön and m + 2s/Ön action lines at m - 3s/Ön and m + 3s/Ön The centre line also acts as the (horizontal) time axis

In summary, to draw a control chart you need 3 pieces of information: 1) Target level: μ 2) Process variation (standard deviation): σ 3) Sample size (how many items you want to inspect): n Interpreting a Control Chart After each sample of size n has been taken, the value for monitoring calculated from the sample is plotted on the control chart. If ●

Value falls between the warning lines, the process is assumed to be "in control" (green light).

●

Value falls outside the action lines, the process is assumed to be "out of control" (red light)

●

This value falls between a warning line and an action line then wait for the next sample (yellow light). If the value of the second sample falls outside the warning lines, then the process is assumed to be "out of control". If it falls inside the warning lines, then the process is assumed to be "in control”

Control chart for tracking signal Control chart ● ● ●

A centre line at the target value Warning lines at target ±2 × standard deviation / sqrt (n) Action lines at target ±3 × standard deviation / sqrt (n)

A control chart for a tracking signal To construct a control chart for a tracking signal, we must know the target value and standard deviation of the tracking signal ● ● ●

Target value = 0 and sample size is 1 1 standard deviation = 1.25 × MAD Since action lines for the sum of forecast errors are at 0 ± 3×1.25×MAD = 0 ± 3.75 ×MAD, the action lines for the tracking signal are at 0 ± 3.75. Similarly, the warning lines for the tracking signal are at 0 ± 2.50

Example

Company ABC would like to employ a tracking signal to measure the performance of its forecasting model

● ●

The tracking signal for this example stays well within the warning lines This indicates that the forecasting model being used is following demand patterns closely enough for the time being.

Tracking signal ● ● ● ● ●

A tracking signal statistically determines if a forecasting model is out-of-control. Used by companies to track changes in patterns Calculated by dividing the most recent sum of forecast errors by the most recent estimate of MAD A tracking signal outside of established limits indicates that a forecasting model should be modified. Compatible with any forecasting method

We discussed the very important issue in practice which is how to monitor the performance of a forecasting model. We emphasise that every forecasting model has some specific assumptions regarding trend, cyclical pattern, seasonality and the irregular component. It is important to closely monitor the performance of a model since some of the assumptions from some methodologies are very strong. Tracking signal is independent of the forecasting methodology used, always use tracking signal to monitor the performance of a model....