[R] Numbering sequences of non-NAs in a vector
Marc Schwartz
marc_schwartz at me.com
Tue Jul 7 23:53:31 CEST 2009
On Jul 7, 2009, at 4:08 PM, Krishna Tateneni wrote:
> Greetings, I have a vector of the form:
> [10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9...] That is, a
> combination
> of sequences of non-missing values and missing values, with each
> sequence
> possibly of a different length.
>
> I'd like to create another vector which will help me pick out the
> sequences
> of non-missing values. For the example above, this would be:
> [1,1,1,1,1,1,NA,NA,NA,NA,2,2,2,NA,NA,NA,3,3,3,3...]. The goal
> ultimately is
> to calculate means separately for each sequence.
>
> Your help is appreciated. If I'm making this more complicated than
> necessary, I'd appreciate knowing that as well!
>
> Many thanks.
Here is one possibility:
Vec <- c(10,8,1,3,0,8,NA,NA,NA,NA,2,1,6,NA,NA,NA,0,5,1,9)
> Vec
[1] 10 8 1 3 0 8 NA NA NA NA 2 1 6 NA NA NA 0 5 1 9
Use rle() to get the runs of NA and non-NA values. See ?rle
Runs <- rle(is.na(Vec))
> Runs
Run Length Encoding
lengths: int [1:5] 6 4 3 3 4
values : logi [1:5] FALSE TRUE FALSE TRUE FALSE
Create grouping values for each run:
Grps <- rep(seq(length(Runs$lengths)), Runs$lengths)
> Grps
[1] 1 1 1 1 1 1 2 2 2 2 3 3 3 4 4 4 5 5 5 5
Now get the means for each run, split by Grps. See ?aggregate
> aggregate(Vec, list(Grps = Grps), mean)
Grps x
1 1 5.00
2 2 NA
3 3 3.00
4 4 NA
5 5 3.75
If you don't want the NA runs included in the result, you could use
subset():
> subset(aggregate(Vec, list(Grps = Grps), mean), !is.na(x))
Grps x
1 1 5.00
3 3 3.00
5 5 3.75
HTH,
Marc Schwartz
More information about the R-help
mailing list