[R] Vectorization/Speed Problem

Tom Johnson tjohnson at covad.net
Wed Nov 21 00:42:38 CET 2007


Hi,

I cannot find a 'vectorized' solution to this 'for loop' kind of problem.
Do you see a vectorized, fast-running solution?

Objective:
Take the value of X at each timepoint and calculate the corresponding value
of Y.  Leading 0's and all 1's for X are assigned to Y; otherwise Y is
incremented by the number of 0's adjacent to the last 1.  The frequency and
distribution of X vary widely and may have ~100 repeated 0's or 1's in a
vector of 10k timepoints.

Example:
time 1   2   3   4   5   6   7   8   9   10  11  12  13  14  15
X    0   1   0   1   0   1   0   0   1   1   1   0   0   0   . .
Y    0   1   2   1   2   1   2   3   1   1   1   2   3   4   . .

What I have done:
My for() and apply()-related standard solutions are too slow.  They are 6
times slower than my prototype, vectorized code which uses cumsum().
However(!)... my results are inaccurate and I can't correct them without
introducing a for()!  Here is my shot at a vectorized solution, as far as I
can take it.

Preliminary Vectorized Code:
X	<- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, byrow=TRUE)
	colnames(X) <- c(paste("a", 1:20, sep=""))
noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <- noX; Y2 <- X; Y3
<- X

for (e in 1:ncol(X)) {
	cumX[,e] <- cumsum(X[,e])
	noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
	cumNoX[,e] <- cumsum(noX[,e])
	}
Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
Y3 <- Y2
Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] - cumX[cumX > 1 & noX > 0]
X; Y3

Your help would be greatly appreciated!  I'm stuck.
Thank you,

Tom
Johnson



More information about the R-help mailing list