The SWEEP Algorithm for Multiple Regression
Richard B. Darlington
Cornell University
The SWEEP procedure is a very efficient way to compute the central
statistics used in multiple regression (multiple correlation, residual variance,
regression slopes, and standard errors of slopes, plus some other values)
from a square correlation or covariance
matrix. It is particularly useful when you want to compute a large number of
different regressions involving the same dependent variables but different sets
of independent variables, as in stepwise regression or
all-subsets regression.
As explained in another article,
I believe all-subsets regression can be useful in identifying patterns of collinearity,
and I use SWEEP in a computer program I have written for that purpose.
SWEEP's best-known disadvantage is that if it is applied many times to the
same matrix, rounding error can accumulate. However, this is not usually a
serious problem, and furthermore there are ways to check for it.
SWEEPing any row j of a correlation or covariance matrix C changes C into a
matrix we'll call G. SWEEPing proceeds as follows:
- Define a "pivot" value PV as pv=1/c[j,j].
- For entries not in row j or column j, compute g[h,i] = c[h,i] -c[h,j]*c[j,i]*pv
- For entries in row j but not column j, g[j,i] = c[j,i]*pv. In other words,
multiply thse entries by PV.
- For entries in column j but not row j, g[j,i] = -c[j,i]*pv. In other words,
multiply these entries by -PV.
- g[j,j] = pv
In a matrix-oriented computer language like GAUSS, it is most efficient to
ignore the "not's" in these directions, since those entries will later be
overwritten anyway. That is, after computing PV (step 1), apply step 2 to all entries
in the matrix, since the entries for which this yields incorrect values will be
overwritten by steps 3, 4, and 5. Similarly, apply step 3 to all entries in row j,
and apply step 4 to all entries in column j, since the only entry for which this
yields incorrect values -- entry g[j,j] -- will be overwritten in step 5. A
GAUSS proc (procedure) for sweeping is shown below.
proc sweep(j,c);
local pv,g;
pv = 1/c[j,j]; @ step 1 @
g = c - c[.,j]*c[j,.]*pv; @ step 2 @
g[j,.] = c[j,.]*pv; @ step 3 @
g[.,j] = -c[.,j]*pv; @ step 4 @
g[j,j] = pv; @ step 5 @
retp(g);
endp;
SWEEP is usually applied several times, perhaps many times. In each sweep,
after G is created from C, G is usually copied onto C, thus erasing C.
Therefore you should really think of SWEEP as a procedure for modifying C.
Since C is modified, before sweeping you typically want to save the original
form of C under some other name such as C0. The particular GAUSS program
just shown would typically be used in a command like "c = sweep(3,c)".
That particular command sweeps the third row of C, and denotes the modified
matrix as C.
Rows can be swept in any order, but to explain what sweeping does, we shall
consider an example in which we sweep rows 1, 2, and 3 of the 5 x 5
covariance matrix below.
9 3 4 -2 5
3 8 6 5 4
C = 4 6 7 3 1
-2 5 3 9 2
5 4 1 2 8
Sweeping rows 1, 2, 3 of C transforms it into the following 5 x 5 matrix:
0.1504 0.0226 -0.1053 -0.5038 0.7368
0.0226 0.3534 -0.3158 0.7744 1.2105
-0.1053 -0.3158 0.4737 0.0526 -1.3158
0.5038 -0.7744 -0.0526 3.9624 1.3684
-0.7368 -1.2105 1.3158 1.3684 0.7895
Spaces separate the swept rows, and corresponding columns, from the unswept
ones.
To understand what sweeping does, imagine that we want to run
regressions predicting variables 4 and 5 from variables 1, 2, and 3. More
generally, usually we usually sweep the predictor or independent variables,
and don't sweep the criterion or dependent variables. Sweeping does the
following:
- The submatrix of swept rows and columns (the upper left 3 x 3 submatrix
in this example) equals the inverse of the corresponding submatrix of the
original C. This submatrix is always symmetric. This inverse matrix can be
used to compute the standard errors and covariances of the regression coefficients,
though it is beyond the scope of this article to describe exactly how.
- The submatrix of unswept rows and columns (the lower right 2 x 2
submatrix in this example) is the matrix of residual variances and covariances.
In this example, in the regression predicting variable 4 from variables 1, 2, 3,
the residuals have a variance of 3.9624, in comparison to the total variance of
9 visible in the original C matrix. In the regression predicting variable 5 from
variables 1, 2, 3, the residuals have a variance of .7895, in comparison to a
total variance of 8. And the covariance between the two sets of residuals is
1.3684. From these values one can easily compute R2 values; for variable 4
we have R2 = 1 - 3.9624/9 = .5597,
while for variable 5 we have R2 = 1 -
.7895/8 = .9013. All these variances and covariances are defined with N, not
N-1, in the denominators. For instance, each variance equals
(sum of squared residuals)/N.
- The submatrix corresponding to the swept rows and the unswept columns
(the upper right 3 x 2 submatrix in this example) contains the regression
coefficients for predicting the unswept variables from the swept ones. In this
example, -.5038, .7744, and .0526 are the coefficients predicting variable 4
from variables 1, 2, 3, while .7368, 1.2105, and -1.3158 are the coefficients
predicting variable 5.
- The remaining submatrix, corresponding to the unswept rows and the swept
columns (the lower left 2 x 3 submatrix in this example) contains no new
information; its transpose is simply -1 times the submatrix described in #3.
SWEEPing is an efficient way to compute a whole series of regressions, as in
stepwise regression. Its principal disadvantage is that rounding error can
accumulate over many sweeps. One way to check on this is to unsweep all the
swept rows -- that is, resweep them, since sweeping is reversible -- to get back
to a matrix that will differ from the original one only due to rounding error,
and then see how much rounding error has accumulated. In fact, SWEEPing in
modern computers is so fast that one can easily do the following:
- Starting with the original matrix C0, perform the desired sweeps and
calculate the desired statistics.
- Unsweep the swept rows, returning to a matrix I'll call C1, which differs
from C0 only due to accumulated rounding error.
- Perform all the sweeps again, starting this time from C1. Compare all the
desired statistics computed in this step to their values computed in step 1.
Presumably the difference between the values found in steps 1 and 3 will
exceed the amount of rounding error in the first computed values, because more sweeps
separated the two computed values than were performed in computing the first
set of computed values. Thus we presumably have an upper limit on the rounding
error in those values.