[R] Vectorization/Speed Problem
Tom Johnson
tjohnson at covad.net
Wed Nov 21 00:42:38 CET 2007
Hi,
I cannot find a 'vectorized' solution to this 'for loop' kind of problem.
Do you see a vectorized, fast-running solution?
Objective:
Take the value of X at each timepoint and calculate the corresponding value
of Y. Leading 0's and all 1's for X are assigned to Y; otherwise Y is
incremented by the number of 0's adjacent to the last 1. The frequency and
distribution of X vary widely and may have ~100 repeated 0's or 1's in a
vector of 10k timepoints.
Example:
time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
X 0 1 0 1 0 1 0 0 1 1 1 0 0 0 . .
Y 0 1 2 1 2 1 2 3 1 1 1 2 3 4 . .
What I have done:
My for() and apply()-related standard solutions are too slow. They are 6
times slower than my prototype, vectorized code which uses cumsum().
However(!)... my results are inaccurate and I can't correct them without
introducing a for()! Here is my shot at a vectorized solution, as far as I
can take it.
Preliminary Vectorized Code:
X <- matrix(sample(c(1,0,0,0,0), 500, replace = TRUE), 25, 20, byrow=TRUE)
colnames(X) <- c(paste("a", 1:20, sep=""))
noX <- X; noX[X!=0] <- 0; cumX <- noX; cumNoX <- noX; Y1 <- noX; Y2 <- X; Y3
<- X
for (e in 1:ncol(X)) {
cumX[,e] <- cumsum(X[,e])
noX[X[,e] < 1 & cumsum(X[,e]) > 0 ,e] <- 1
cumNoX[,e] <- cumsum(noX[,e])
}
Y1[cumNoX > 0] <- cumNoX[cumNoX > 0] + 1
Y2[X == 0 & noX > 0] <- Y1[X == 0 & noX > 0]
Y3 <- Y2
Y3[cumX > 1 & noX > 0] <- Y2[cumX > 1 & noX > 0] - cumX[cumX > 1 & noX > 0]
X; Y3
Your help would be greatly appreciated! I'm stuck.
Thank you,
Tom
Johnson
More information about the R-help
mailing list