[R] Pairwise n for large correlation tables?
Adam D. I. Kramer
adik at ilovebacon.org
Fri Aug 11 08:02:02 CEST 2006
On Tue, 8 Aug 2006, ggrothendieck at gmail.com wrote:
> Try this:
>
> # mat is test matrix
> mat <- matrix(1:25, 5)
> mat[2,2] <- mat[3,4] <- NA
> crossprod(!is.na(mat))
Exactly what I was looking for! Thanks.
--Adam
>
>
> On 8/7/06, Adam D. I. Kramer <adik at ilovebacon.org> wrote:
>> Hello,
>>
>> I'm using a very large data set (n > 100,000 for 7 columns), for which I'm
>> pretty happy dealing with pairwise-deleted correlations to populate my
>> correlation table. E.g.,
>>
>> a <- cor(cbind(col1, col2, col3),use="pairwise.complete.obs")
>>
>> ...however, I am interested in the number of cases used to compute each
>> cell of the correlation table. I am unable to find such a function via
>> google searches, so I wrote one of my own. This turns out to be highly
>> inefficient (e.g., it takes much, MUCH longer than the correlations do). Any
>> hints, regarding other functions to use or ways to maket his speedier, would
>> be much appreciated!
>>
>> pairwise.n <- function(df=stop("Must provide data frame!")) {
>> if (!is.data.frame(df)) {
>> df <- as.data.frame(df)
>> }
>> colNum <- ncol(df)
>> result <- matrix(data=NA,nrow=colNum,ncol=ncolNum,dimnames=list(colnames(df),colnames(df)))
>> for(i in 1:colNum) {
>> for (j in i:colNum) {
>> result[i,j] <- length(df[!is.na(df[i])&!is.na(df[j])])/colNum
>> }
>> }
>> result
>> }
>>
>> --
>> Adam D. I. Kramer
>> University of Oregon
More information about the R-help
mailing list