[R] to match samples by minute
Zhang Weiwu
zhangweiwu at realss.com
Thu Aug 15 18:31:04 CEST 2013
Perhaps this is simple and common, but it took me quite a while to admit I
cannot solve it in a simple way.
The data frame `df` has the following columns:
unixtime, value, factor
Now I need a matrix of:
unixtime, value-difference-between-factor1-and-factor2
The naive solution is:
df[df$factor == "factor1",] - df[df$factor == "factor2",]
It won't work, because factor1 has 1000 valid samples, factor2 has 1400
valid samples. The invalid samples are dropped on-site, i.e. removed before
piped into R.
To solve it, I got 2 ideas.
1. create a new data.frame with 24*60 records, each record represent a
minute in the day, because sampling is done once per minute. Now fit all
records into their 'slots' by their nearest minute.
2. pair each record with another that has similar unixtime but different
factor.
Both ideas require for loop into individual records. It feels to C-like to
write a program that way. Is there a professional way to do it in R? If not,
I'd even prefer to rewrite the sampler (in C) to not to discard invalid
samples on-site, than to mangle R.
Thanks.
More information about the R-help
mailing list