[R] aggregating data with quality control

Sat Aug 31 13:25:35 CEST 2024

В Sat, 31 Aug 2024 11:15:10 +0000
Stefano Sofia <stefano.sofia using regione.marche.it> пишет:

> Evaluating the daily mean indipendently from the status is very easy:
> 
> aggregate(mydf$hs, by=list(format(mydf$data_POSIX, "%Y"),
> format(mydf$data_POSIX, "%m"), format(mydf$data_POSIX, "%d")),
> my.mean)
> 
> 
> Things become more complicated when I need to export also the status:
> this should be "C" when all 48 data have status equal to "C", and
> status "D" when at least one value has status ="D".
> 
> 
> I have no clue on how to do that in an efficient way.

You can make the status into an ordered factor:

# come up with some statuses
status <- sample(c('C', 'D'), 42, TRUE, c(.9, .1))

# convert them into factors, specifying that D is "more than" C
status <- ordered(status, c('C', 'D'))

Since the factor is ordered and can be subject to comparison like
status[1] < status[2], you can now use max() on your groups. If the
sample contains any 'D's, max() will return a 'D', because it's larger
than any 'C's. If the sample contains only 'C's, that's the maximal
value by default.

-- 
Best regards,
Ivan