To avoid the charge that I engaged in too much post hoc model selection, I defined a "standard" model which included TIME, TIME*TIME, TIME*TIME*TIME, and the averaged lagged terms A1-A6 described earlier, plus whatever seasonal or cyclical terms were used in the competing ARIMA models. This standard model fitted each of these three data sets better than any ARIMA method, even though in each case up to 30 ARIMA models were applied to the same data. I then proceeded to engage in the same general sort of post hoc model selection that is commonly done in any area of model-building. That post hoc process is described below separately for each data set.
Of the several ARIMA models Pankratz tried, the one with the smallest SEE included two AR terms and two MA terms plus a constant, and showed SEE = 6.72. After observing SEE = 6.22 with my standard model, I removed the quadratic and cubic terms because they contributed little. I then applied stepwise regression with default options to a model with a constant, TIME, and 10 lagged terms B1-B10. The stepwise program selected a model with a constant, TIME, and B1, B3, B4, B6, B7, B10, and showed SEE = 5.40, well below either previous value. The stepwise program did not seem to capitalize on chance substantially more than Pankratz did, especially since the absolute values of t for the six lagged terms in the stepwise model were all over 3.0.
The presence of several long-lag variables in this model indicates that the model is in effect fitting local slopes. For instance, a model which gave B1 a positive weight (as this model did) and B10 a negative weight (as it also did) would be doing something rather similar to fitting a straight line to those two points and projecting it to the right to make the forecast.
The inclusion of linear, quadratic, and cubic terms for TIME may seem like overkill for a small data set, but in the various models I tried with this data set, even the cubic term remained highly significant; in the final model described below it showed t = -5.484, df = 42, p = .000002. Due partly to the large semester effect, the cubic effect is hardly visible in this data set until one looks for it, at which point it does become visible. That is, the series rises visibly to about 1965, then falls to about 1970, then rises.
I also added 5 lagged variables B1 through B5. Testing them as a set yielded p = .0054. After some experimentation I dropped B1, B2, and B4, leaving B3 and B5. The t's for these variables were respectively -3.74 and -2.72, and testing these two variables as a set yielded p = .00030. These results seem stronger than could easily be explained by chance selection of two variables from five; the number of ways of selecting two items from 5 is only 10. The final model showed SEE = 25.03, well below the ARIMA value of 31.44.
Again the nonlinear polynomial terms were highly significant. The cubic nature of the curve is clear once it is pointed out; the trend falls steeply, then less steeply, then more steeply again. This series also illustrates the substantive reasonableness of using polynomial terms. If you had to project the next observation in a series like this (that falls steeply, then less steeply, then more steeply again), would you rather fit a straight line to the entire series, or take advantage of the fact that the last part of the curve is falling even more steeply than the curve as a whole? A polynomial essentially does the latter.
In the housing-permit example the best model used both short-lag and long-lag terms. In the college-enrollment and sales examples the best models used both polynomial terms and long-lag terms. In the sales example A-terms worked best for lag-terms, while in the housing and college examples B-terms worked best. It may be that each observation in the sales example included more random error, though A-terms are especially recommended for small sample sizes and the sales data set was the smallest of the three.
Darlington, Richard (1968). Multiple regression in psychological research and practice. Psychological Bulletin, vol 69, 161-182
Darlington, Richard (1990). Regression and Linear Models. New York: McGraw-Hill
Gilchrist, Warren (1976). Statistical Forecasting. New York: Wiley
Pankratz, Alan (1983). Forecasting with Univariate Box-Jenkins Models: concepts and cases. New York: Wiley