[R] subsetting, aggregating and zoo
antonio rodriguez
antonio.raju at gmail.com
Thu Nov 2 08:58:06 CET 2006
Dear Gabor,
The solution below was very useful for me, but I have, I hope, one last
question about extracting some specific data. How to count , from below,
the number of times a date is repeated, that is:
starts
[1] "1988-01-13" "1988-01-13" "1988-01-16" "1988-01-20" "1988-01-20"
[6] "1988-01-20" "1988-01-25" "1988-01-25" "1988-01-25" "1988-01-25"
dput(starts[1:10], control = "all")
structure(c(6586, 6586, 6589, 6593, 6593, 6593, 6598, 6598, 6598,
6598), class = "Date")
So I need to know how many times, for example, "1988-01-03" is repeated
(in this case, 2 times) Can't find what function to use.
Best regards,
Antonio
Gabor Grothendieck escribió:
> Sorry, the line starting idx <- should have time(z) in place of z.
> That is,
>
> year <- as.Date(c(
> "1988-01-13", "1988-01-14", "1988-01-16", "1988-01-20", "1988-01-21",
> "1988-01-22", "1988-01-25", "1988-01-26", "1988-01-27", "1988-01-28"))
>
> x <- c(
> 7.973946, 9.933518, 7.978227, 7.512960, 6.641862, 5.667780,
> 5.721358,
> 6.863729, 9.600000, 9.049846)
>
> z <- zoo(x, year)
>
> idx <- cumsum(c(1, diff(time(z)) != 1))
>
> starts <- time(z)[match(idx, idx)]
> ends <- time(z)[cumsum(table(idx))[idx]]
>
> aggregate(z, starts, mean)
>
>
> By the way, dput(v, control = "all") will output variable v
> in a form easily pastable by someone else into their session.
>
> On 10/29/06, antonio rodriguez <antonio.raju at gmail.com> wrote:
>> Gabor Grothendieck escribió:
>> > Try this:
>> >
>> > # test data
>> > x <- c(1:4, 6:8, 10:14)
>> > z <- zoo(x, as.Date(x))
>> >
>> > # idx is 1 for first run, 2 for second run, etc.
>> > idx <- cumsum(c(1, diff(z) != 1))
>> >
>> > # starts replaces each time with the start time of that run
>> > # ends is similar but for ends
>> > starts <- time(z)[match(idx, idx)]
>> > ends <- time(z)[cumsum(table(idx))[idx]]
>> >
>> > # average over each run using the time of the end of run for the
>> result
>> > # replace ends with starts if that is preferred
>> > aggregate(z, ends, mean)
>> >
>> Yes it's OK in your example, but when I try to do it with my data I
>> don't get the same figure.
>>
>> is.zoo(z)
>> [1]TRUE
>>
>> atributes(z)
>> $index
>> [1] "1988-01-13" "1988-01-14" "1988-01-16" "1988-01-20" "1988-01-21"
>> ..................................................................................................
>>
>> [3861] "2005-12-20" "2005-12-23" "2005-12-24" "2005-12-25" "2005-12-26"
>> [3866] "2005-12-27" "2005-12-30"
>>
>> $class
>> [1] "zoo"
>>
>> z[1:10]
>>
>> 1988-01-13 1988-01-14 1988-01-16 1988-01-20 1988-01-21 1988-01-22
>> 1988-01-25
>> 7.973946 9.933518 7.978227 7.512960 6.641862 5.667780
>> 5.721358
>> 1988-01-26 1988-01-27 1988-01-28
>> 6.863729 9.600000 9.049846
>>
>> If I follow your instructions,
>>
>> idx <- cumsum(c(1, diff(z) != 1))
>> starts <- time(z)[match(idx, idx)]
>> ends <- time(z)[cumsum(table(idx))[idx]]
>>
>> s1 <- aggregate(z, starts, mean)
>> s1[1:10]
>>
>> 1988-01-13 1988-01-14 1988-01-16 1988-01-20 1988-01-21 1988-01-22
>> 1988-01-25
>> 7.973946 9.933518 7.978227 7.512960 6.641862 5.667780
>> 5.721358
>> 1988-01-26 1988-01-27 1988-01-28
>> 6.863729 9.600000 9.049846
>>
>> s2 <- aggregate(z, starts, mean)
>> s2[1:10]
>>
>> 1988-01-13 1988-01-14 1988-01-16 1988-01-20 1988-01-21 1988-01-22
>> 1988-01-25
>> 7.973946 9.933518 7.978227 7.512960 6.641862 5.667780
>> 5.721358
>> 1988-01-26 1988-01-27 1988-01-28
>> 6.863729 9.600000 9.049846
>>
>>
>> Always the same. Don't know why (there are not NA's in the series)
>>
>> Antonio
>>
>>
>>
>>
>>
>>
>>
>
More information about the R-help
mailing list