[R] running count in data.frame
    Mark Knecht 
    markknecht at gmail.com
       
    Wed Jul  1 04:49:18 CEST 2009
    
    
  
Hi,
   I need to keep a running count of events that have happened in my
data.frame. I found a document called usingR that had an example of
doing this for random coin flips and I tried to modify it. It seems to
sort of work in the beginning, but then it stops and I don't
understand why. I'm trying to duplicate essentially the Excel
capability of =SUM($A$1:$A(Row number))
   The example looked like this:
x = cumsum(sample(c(-1,1),100,replace=T))
which does seem to work: (100 shortened to 20 for email)
> cumsum(sample(c(-1,1),20,replace=T))
 [1] 1 0 1 0 1 2 3 4 5 4 3 4 5 6 7 6 5 4 5 6
> cumsum(sample(c(-1,1),20,replace=T))
 [1] 1 2 1 2 1 2 3 2 3 4 5 6 7 8 7 8 7 8 9 8
> cumsum(sample(c(-1,1),20,replace=T))
 [1] 1 0 1 0 1 0 1 2 3 4 5 6 7 8 7 8 7 6 7 8
> cumsum(sample(c(-1,1),20,replace=T))
 [1]  1  0  1  0  1  0 -1  0  1  0  1  0  1  2  1  0  1  2  3  4
> cumsum(sample(c(-1,1),20,replace=T))
 [1]  1  2  1  0 -1  0 -1 -2 -1 -2 -1 -2 -1 -2 -3 -2 -3 -4 -5 -6
However that example doesn't have to read from the data.frame so I
tried to leverage on some earlier help today but it isn't working for
me. The goal is the MyFrame$lc keeps a running total of events in the
MyFrame$l column, and likewise for $pc and $p. It seems that $lc
starts off OK until it gets to a 0 and then resets back to 0 which I
don't want. The $pc counter never seems to count. I also get a warning
message I don't understand so clearly I'm doing something very wrong
here:
> F1 <- RunningCount(F1)
Warning messages:
1: In MyFrame$pc[pos] <- cumsum(as.integer(pos)) :
  number of items to replace is not a multiple of replacement length
2: In MyFrame$lc[pos] <- cumsum(as.integer(pos)) :
  number of items to replace is not a multiple of replacement length
> F1
    x  y p  l pc lc
1   1 -4 0 -4  0  1
2   2 -3 0 -3  0  2
3   3 -2 0 -2  0  3
4   4 -1 0 -1  0  4
5   5  0 0  0  0  0
6   6  1 1  0  0  0
7   7  2 2  0  0  0
8   8  3 3  0  0  0
9   9  4 4  0  0  0
10 10  5 5  0  0  0
>
I wanted $lc to go up to 4 and then hold 4 until the end. $pc should
have stays 0 until line 6 and then gone up to 5 at the end.
Any and all inputs appreciated on what I'm doing wrong.
Thanks,
Mark
AddCols = function (MyFrame) {
	MyFrame$p<-0
	MyFrame$l<-0
	MyFrame$pc<-0
	MyFrame$lc<-0
	return(MyFrame)
}
BinPosNeg = function (MyFrame) {
## Positive y in p column, negative y in l column
	pos <- MyFrame$y > 0
	MyFrame$p[pos] <- MyFrame$y[pos]
	MyFrame$l[!pos] <- MyFrame$y[!pos]
	return(MyFrame)
}
RunningCount = function (MyFrame) {
## Running count of p & l events
	pos <- (MyFrame$p > 0)
	MyFrame$pc[pos] <- cumsum(as.integer(pos))
	pos <- (MyFrame$l < 0)
	MyFrame$lc[pos] <- cumsum(as.integer(pos))
	return(MyFrame)
}
F1 <- data.frame(x=1:10, y=-4:5)
F1 <- AddCols(F1)
F1
F1 <- BinPosNeg(F1)
F1
F1 <- RunningCount(F1)
F1
    
    
More information about the R-help
mailing list