[R] fast way to compare two matrices of combinations
Mark W Kimpel
mwkimpel at gmail.com
Thu Mar 13 17:23:57 CET 2008
I have a list (length 750), each element containing a vector of unique
strings (unique gene ids), with length up to ~40 (median 15). I want to
compile a matrix of all possible triplets and their frequency within
gene elements. Using combn and a lot of looping, I am accomplishing this
but it is VERY slow.
I've tried to figure out a way to vectorize this, using "match" and
"%in%", but can't get my mind around it.
Below is my code. sig.tf.pairs is the list. Suggestions?
Mark
############################################################
M <- 3 # 3 for triplets, etc.
##########################################################
# count all triplets
all.triplets <- NULL
all.count.vec <- NULL
for (i in 1:length(sig.tf.pairs)){
if (length(sig.tf.pairs[[i]] >= M)){
triplets <- combn(sig.tf.pairs[[i]], M, simplify = TRUE)
for (j in 1:ncol(triplets)){
o <- order(triplets[,j])
triplets[,j] <- triplets[o,j]
count.vec <- rep(1, ncol(triplets))
}
if (is.null(all.count.vec)){
all.count.vec <- count.vec
all.triplets <- triplets
} else {
redundant.vec <- NULL
for (k in 1:ncol(all.triplets)){
for (m in 1:ncol(triplets)){
if (length(intersect(triplets[,m], all.triplets[,k] == M))){
all.count.vec[k] <- all.count.vec[k] + 1
redundant.vec <- c(redundant.vec, m)
}
}
}
if(!is.null(redundant.vec)){
triplets <- triplets[,-redundant.vec]
count.vec <- count.vec[,-redundant.vec]
}
all.triplets <- cbind(all.triplets, triplets)
all.count.vec <- c(all.count.vec, count.vec)
}
}
}
###################################
--
Mark W. Kimpel MD ** Neuroinformatics ** Dept. of Psychiatry
Indiana University School of Medicine
15032 Hunter Court, Westfield, IN 46074
(317) 490-5129 Work, & Mobile & VoiceMail
(317) 204-4202 Home (no voice mail please)
mwkimpel<at>gmail<dot>com
More information about the R-help
mailing list