[R] Which columns give rise to linear dependency?

John Fox jfox at mcmaster.ca
Tue Nov 5 16:03:12 CET 2002

Dear Michael,

There are several ways of finding near dependencies. For example, Belsley, 
Kuh, and Welsch in Regression Diagnostics (1980) use the singular-value 
decomposition. Here are a couple of simple approaches:

(1) Use the principal-component analysis of the standardized X-matrix. Very 
small component variances correspond to near collinearities, and the 
corresponding principal-component coefficients give you linear combination 
of the standardized x's nearly equal to 0.

(2) Look at the variance-inflation factors. Very large VIFs correspond to 
variables that are nearly linearly dependent on others; regress each such 
variable on the others to see what the dependencies are. (Some of these 
regressions will be redundant.)

I hope that this helps,

At 12:24 PM 11/5/2002 +0000, Michael Dewey wrote:
>Short version
>If I have a data frame X and I suspect
>that there is a dependency between
>the columns how do I confirm that,
>and how do I tell which subset of columns
>is involved?
>Long version
>A colleague had been trying to use
>the SPSS RELIABILITY procedure.
>It told her that the determinant of the
>matrix was small. She asked me what that meant
>and I told her that one of her variables was a
>linear combination of others.
>I agreed to investigate further and imported
>the datasets into R. The rows of each X represent
>people, and the columns items. The x_{ij} are binary (coded
>0/1). Three of the datasets gave the
>error message from SPSS. I confirmed that
>the matrix involved was indeed var(X)
>and that det(var(X)) agreed with SPSS.
>What I thought was that I would find
>that the smallest eigenvalues would
>be zero, but in two of the datasets that was not true.
>In the third dataset I traced the problem quickly
>to a pair of items which were
>perfectly correlated.
>1 I suspect that det(var(X)) is a poor test of
>   whether X is of reduced rank. I have also looked at kappa(X)
>   which gives values of 10 and 17 for the two offending scales,
>   but I have no feel for whether that is high (bad?).
>2 I thought that by doing svd(X) and then
>   examining V I could answer my problem.
>   However the elements of V, specifically
>   the last column, did not show what I
>   hoped: most values effectively
>   zero and the rest adding to zero.
>   This did work for the third dataset though.
>3 I think that SPSS was trying to invert
>   var(X) in order to compute the multiple
>   correlation of each item with the others.
>   Is there any neat way of doing that in R?
>I am using 1.5.1 on Windows 98 if that makes
>a difference.
>If anyone wants to look at one of the datasets
>I have her permission to make it available.
>Point your browser at http://www.aghmed.fsnet.co.uk/r.html
>Michael Dewey
>michael.dewey at nottingham.ac.uk
>r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
>Send "info", "help", or "[un]subscribe"
>(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

John Fox
Department of Sociology
McMaster University
Hamilton, Ontario, Canada L8S 4M4
email: jfox at mcmaster.ca
phone: 905-525-9140x23604
web: www.socsci.mcmaster.ca/jfox

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list