[R] Computing the rank of a matrix.
Ravi Varadhan
rvaradhan at jhmi.edu
Tue Jul 24 23:35:25 CEST 2007
Hi Martin,
I just realized (courtesy: ?qr) that LAPACK=TRUE always gives full rank, no
matter what the matrix and tolerance are. So, clearly my results using
LAPACK=TRUE should be ignored. So, the real comparison is only between
rankMat and qr(., LAPACK=FALSE)$rank. I cant help but fell that there can
be no "correct" solution to an ill-posed problem. Furthermore, I haven't
come across a real application where the numerical estimate of a rank is
directly useful.
Best,
Ravi.
----------------------------------------------------------------------------
-------
Ravi Varadhan, Ph.D.
Assistant Professor, The Center on Aging and Health
Division of Geriatric Medicine and Gerontology
Johns Hopkins University
Ph: (410) 502-2619
Fax: (410) 614-9625
Email: rvaradhan at jhmi.edu
Webpage: http://www.jhsph.edu/agingandhealth/People/Faculty/Varadhan.html
----------------------------------------------------------------------------
--------
-----Original Message-----
From: Martin Maechler [mailto:maechler at stat.math.ethz.ch]
Sent: Tuesday, July 24, 2007 1:43 PM
To: RAVI VARADHAN
Cc: Martin Maechler; r-help at stat.math.ethz.ch
Subject: Re: [R] Computing the rank of a matrix.
>>>>> "RV" == RAVI VARADHAN <rvaradhan at jhmi.edu>
>>>>> on Sat, 07 Apr 2007 18:39:36 -0400 writes:
^^^^^^^^^^^^^^^^
this is a bit a late reply... better late than never
RV> Hi Martin,
Hi Ravi, thanks for your research.
RV> I played a bit with rankMat. I will first state the
RV> conclusions of my numerical experiments and then present
RV> the results to support them:
RV> 1. I don't believe that rankMat, or equivalently
RV> Matlab's rank computation, is necessarily better than
RV> qr(.)$rank or (qr., LAPACK=TRUE)$rank. In fact, for the
RV> notorious Hilbert matrix, rankMat can give poor
RV> estimates of rank.
RV> 2. There exists no universally optimal (i.e. optimal
RV> for all A) tol in qr(A, tol)$rank that would be
RV> optimally close to rankMat. The discrepancy in the
RV> ranks computed by qr(A)$rank and rankMat(A) obviously
RV> depends on the matrix A. This is evident from the tol
RV> used in rankMat:
RV> tol <- max(d) * .Machine$double.eps * abs(singValA[1])
RV> So, this value of tol in qr will also minimize the discrepancy.
RV> 3. The tol parameter is irrelevant in qr(A, tol,
RV> LAPACK=TRUE)$rank, i.e. LAPACK doesn't seem to utilize
RV> the tol parameter when computing the rank. However,
RV> qr(A, tol, LAPACK=FALSE)$rank does depend on tol.
Yes, indeed! The help page of qr() actually says so
{at least now, don't know about 3 months ago}.
RV> Now, here are the results:
RV> 1.
>> hilbert.rank <- matrix(NA,20,3)
>> hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
>> for (i in 1:20) {
RV> + himat <- hilbert(i)
RV> + hilbert.rank[i,1] <- rankMat(himat)
RV> + hilbert.rank[i,2] <- qr(himat,tol=1.0e-14)$rank
RV> + hilbert.rank[i,3] <- qr(himat, tol = .Machine$double.eps, LAPACK =
TRUE)$rank
RV> + }
>> hilbert.rank
RV> [,1] [,2] [,3]
RV> [1,] 1 1 1
RV> [2,] 2 2 2
RV> [3,] 3 3 3
RV> [4,] 4 4 4
RV> [5,] 5 5 5
RV> [6,] 6 6 6
RV> [7,] 7 7 7
RV> [8,] 8 8 8
RV> [9,] 9 9 9
RV> [10,] 10 10 10
RV> [11,] 10 11 11
RV> [12,] 11 12 12
RV> [13,] 11 12 13
RV> [14,] 11 13 14
RV> [15,] 12 13 15
RV> [16,] 12 15 16
RV> [17,] 12 16 17
RV> [18,] 12 16 18
RV> [19,] 13 17 19
RV> [20,] 13 17 20
RV> We see that rankMat underestimates the rank for n > 10, but
qr(himat, tol = .Machine$double.eps, LAPACK=TRUE)$rank gets it right.
Yes, indeed; and that's seems a bit against the ``common
knowledge'' that svd() is more reliable than qr()
Hmm.... I'm still baffled a bit..
Note that with the Hilbert matrices,
one might argue that hilbert(20) might really not have a
correct "estimated rank" of 20,
but at least for hilbert(13) or so, the rank should be == n.
BTW, there's a nice plot, slightly related to this problem,
using rcond() from the Matrix package :
library(Matrix)
hilbert <- function(n) { i <- 1:n; 1 / outer(i - 1, i, "+") }
rcHilb <- sapply(1:20, function(n) rcond(Matrix(hilbert(n))))
plot(rcHilb, xlab = "n", log = "y", col = 2, type = "b",
main = "reciprocal condition numbers of Hilbert(n)")
where one sees that the LAPACK code that underlies
Matrix::rcond() also gets into "problem" when estimating the
condition number for hilbert(n) when n >~ 16 .
I think if we wanted to make real progress here, we'd have to
consult with numerical analyist specialists.
But for me the topic is too remote to be followed up further at
the moment.
One conclusion for me is that to estimate the rank of a matrix
in current versions of R, one should use
rankMat <- function(x) qr(x, LAPACK = TRUE)$rank
(as was suggested as one possibility in the original thread)
Regards, and thank you again, Ravi.
Martin Maechler, ETH Zurich
RV> 2.
RV> Here I first, created a random rectangular matrix, and then added a
number of rows to it, where these new rows are the same as some of the
original rows except for a tiny amount of noise, which I call eps. So, the
degree of rank deficiency is controlled by eps. I let eps take on 3 values:
1.e-07, 1.e-10, and 1.e-14, and show that the optimal tol (in terms of being
close to rankMat) depends on the level of precision (eps) in the matrix.
>> set.seed(123)
>> nrow <- 15
>> ncol <- 20
>> nsim <- 5000
>> ndefic <- 4 # number of "nearly" dependent rows
>> eps <- 1.e-07
>> rnk <- matrix(NA, nsim, 5)
>> for (j in 1:nsim) {
RV> + A <- matrix(rnorm(nrow*ncol),nrow, ncol)
RV> + newrows <- matrix(NA, ndefic, ncol)
RV> + for (i in 1:ndefic) {
RV> + newrows[i,] <- A[nrow-i,] + rnorm(ncol, sd=min(abs(A[nrow-i+1,]))*
eps)
RV> + }
RV> +
RV> + B <- rbind(A,newrows)
RV> + rnk[j,1] <- rankMat(B)
RV> + rnk[j,2] <- qr(B, tol = 1.e-07)$rank
RV> + rnk[j,3] <- qr(B, tol = 1.e-11)$rank
RV> + rnk[j,4] <- qr(B, tol = 1.0e-14)$rank
RV> + rnk[j,5] <- qr(B, tol = .Machine$double.eps, LAPACK = TRUE)$rank
RV> + }
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1] 19351 53 0 0
RV> We observe that both qr(B, tol=1.e-14)$rank and qr(B, tol =
.Machine$double.eps, LAPACK = TRUE)$rank give exactly the same rank as
rankMat.
RV> Now, we change eps <- 1.e-10 and the results are:
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1] 19778 14263 166 220
RV> This means that a tolerance of 1.e-14 works best.
RV> Now we lower eps further: eps <- 1.e-14
>> apply(abs(rnk[,1] - rnk[,2:5]),2,sum)
RV> [1] 0 3 665 20000
RV> Clearly, the smaller tolerances yield rank estimates that are higher
than that given by rankMat. That is, rankMat underestimates the rank as in
the case of Hilbert matrix.
RV> 3.
RV> Now we show that qr(., tol, LAPACK=TRUE)$rank is independent of tol:
>> exp <- -(7:16)
>> tol <- 10^exp
>> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x,
LAPACK=FALSE)$rank)
RV> [1] 10 12 14 14 15 16 16 17 18 19
>> sapply(tol, A=hilbert(20), function(x,A)qr(A, tol=x,
LAPACK=TRUE)$rank)
RV> [1] 20 20 20 20 20 20 20 20 20 20
RV> Looking forward to comments.
RV> Best,
RV> Ravi.
RV> ----- Original Message -----
RV> From: Martin Maechler <maechler at stat.math.ethz.ch>
RV> Date: Saturday, April 7, 2007 10:57 am
RV> Subject: Re: [R] Computing the rank of a matrix.
RV> To: Ravi Varadhan <rvaradhan at jhmi.edu>
RV> Cc: apjaworski at mmm.com, "'José Luis Aznarte M.'"
<jlaznarte at decsai.ugr.es>, r-help at stat.math.ethz.ch
>> >>>>> "Ravi" == Ravi Varadhan <rvaradhan at jhmi.edu>
>> >>>>> on Fri, 6 Apr 2007 12:44:33 -0400 writes:
>>
Ravi> Hi, qr(A)$rank will work, but just be wary of the
Ravi> tolerance parameter (default is 1.e-07), since the
Ravi> rank computation could be sensitive to the tolerance
Ravi> chosen.
>>
>> Yes, indeed.
>>
>> The point is that rank(<Matrix>)
>> is well defined in pure math (linear algebra), as well as
>> a "singular matrix" is.
>>
>> The same typically no longer makes sense as soon as you enter
>> the real world: A matrix "close to singular" may have to be
>> treated "as if singular" depending on its "singularity
>> closeness" {{ learn about the condition number of a matrix }}
>> and the same issues arise with rank(<matrix>).
>>
>> Of course, the matlab programmers know all this (and much more),
>> and indeed, matlab's rank(A) really is
>> rank(A, tol = tol.default(A))
>>
>> and is based on the SVD instead of QR decomposition since the
>> former is said to be more reliable (even though slightly slower).
>>
>> R's equivalent (with quite a bit of fool-proofing) would be the
>> following function (assuming correct online documentation of matlab):
>>
>>
>> rankMat <- function(A, tol = NULL, singValA = svd(A, 0,0)$d)
>> {
>> ## Purpose: rank of a matrix ``as Matlab''
>> ##
----------------------------------------------------------------------
>> ## Arguments: A: a numerical matrix, maybe non-square
>> ## tol: numerical tolerance (compared to singular
values)
>> ## singValA: vector of non-increasing singular values
>> of A
>> ## (pass as argument if already known)
>> ##
----------------------------------------------------------------------
>> ## Author: Martin Maechler, Date: 7 Apr 2007, 16:16
>> d <- dim(A)
>> stopifnot(length(d) == 2, length(singValA) == min(d),
>> diff(singValA) < 0) # must be sorted decreasingly
>> if(is.null(tol))
>> tol <- max(d) * .Machine$double.eps * abs(singValA[1])
>> else stopifnot(is.numeric(tol), tol >= 0)
>> sum(singValA >= tol)
>> }
>>
>>
>> A small scale simulation with random matrices,
>> i.e., things like
>>
>> ## ranks of random matrices; here will have 5 all the time:
>> table(replicate(1000, rankMat(matrix(rnorm(5*12),5, 12) )))# < 1 sec.
>>
>> indicates that qr(.)$rank gives the same typically,
>> where I assume one should really use
>>
>> qr(., tol = .Machine$double.eps, LAPACK = TRUE)$rank
>>
>> to be closer to Matlab's default tolerance.
>>
>> Ok, who has time to investigate further?
>> Research exercise:
>>
1> Is there a fixed number, say t0 <- 1e-15
1> for which qr(A, tol = t0, LAPACK=TRUE)$rank is
1> ``optimally close'' to rankMat(A) ?
>>
2> how easily do you get cases showing svd(.) to more reliable
2> than qr(., LAPACK=TRUE)?
>>
>> To solve this in an interesting way, you should probably
>> investigate classes of "almost rank-deficient" matrices,
>> and I'd also be interested if you "randomly" ever get matrices A
>> with rank(A) < min(dim(A)) - 1
>> (unless you construct some columns/rows exactly from earlier
>> ones or use all-0 ones)
>>
>> Martin Maechler, ETH Zurich
>>
>>
>>
>>
Ravi> Ravi.
>>
Ravi> -----------------------------------------------------------------
Ravi> Ravi Varadhan, Ph.D.
Ravi> Assistant Professor, The Center on Aging and Health
>> .......
Ravi>
>>
>>
Ravi>
--------------------------------------------------------------------
>>
>>
>>
>> >> How about
>>
>> >> qr(A)$rank
>>
>> >> or perhaps
>>
>> >> qr(A, LAPACK=TRUE)$rank
>>
>> >> Cheers,
>>
>> >> Andy
>>
>>
>> >> Hi! Maybe this is a silly question, but I need the
>> >> column rank (
>> >> of a matrix and R function 'rank()' only gives me the
>> >> ordering of the elements of my matrix. How can I
>> >> compute the column rank of a matrix? Is there not an R
>> >> equivalent to Matlab's 'rank()'? I've been browsing
>> >> for a time now and I can't find anything, so any help
>> >> will be greatly appreciated. Best regards!
>>
>>
>> >> -- -- Jose Luis Aznarte M.
>> >> Department of Computer
>> >> Science and Artificial Intelligence Universidad de
>> >> Granada Tel. +34 - 958 - 24 04 67 GRANADA (Spain) Fax:
>> >> +34 - 958 - 24 00 79
More information about the R-help
mailing list