[R] Multiple if function

Bert Gunter bgunter.4567 at gmail.com
Thu Sep 17 03:53:22 CEST 2015


Dénes:

A fair point! The only reason I have is ignorance -- I have not used
data.table. I am not surprised that it and perhaps other packages
(dplyr maybe?) can do things in a reasonable way very efficiently. The
only problem is that it requires us to learn yet another
package/paradigm.  There may also be issues with ts flexibility
compared to base R data structures, but, again, I must plead ignorance
here.

It is interesting that, mod the unsplit reconstruction of the original
vectors, Chuck's base R solution is as efficient as data.table's.

Cheers,
Bert
Bert Gunter

"Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom."
   -- Clifford Stoll


On Wed, Sep 16, 2015 at 4:42 PM, Dénes Tóth <toth.denes at ttk.mta.hu> wrote:
>
>
> On 09/16/2015 04:41 PM, Bert Gunter wrote:
>>
>> Yes! Chuck's use of mapply is exactly the split/combine strategy I was
>> looking for. In retrospect, exactly how one should think about it.
>> Many thanks to all for a constructive discussion .
>>
>> -- Bert
>>
>>
>> Bert Gunter
>>
>>>>>
>>>>> Use mapply like this on large problems:
>>>>>
>>>>> unsplit(
>>>>>    mapply(
>>>>>        function(x,z) eval( x, list( y=z )),
>>>>>        expression( A=y*2, B=y+3, C=sqrt(y) ),
>>>>>        split( dat$Flow, dat$ASB ),
>>>>>        SIMPLIFY=FALSE),
>>>>>    dat$ASB)
>>>>>
>>>>> Chuck
>>>>>
>
>
> Is there any reason not to use data.table for this purpose, especially if
> efficiency is of concern?
>
> ---
>
> # load data.table and microbenchmark
> library(data.table)
> library(microbenchmark)
> #
> # prepare data
> DF <- data.frame(
>     ASB = rep_len(factor(LETTERS[1:3]), 3e5),
>     Flow = rnorm(3e5)^2)
> DT <- as.data.table(DF)
> DT[, ASB := as.character(ASB)]
> #
> # define functions
> #
> # Chuck's version
> fnSplit <- function(dat) {
>     unsplit(
>         mapply(
>             function(x,z) eval( x, list( y=z )),
>             expression( A=y*2, B=y+3, C=sqrt(y) ),
>             split( dat$Flow, dat$ASB ),
>             SIMPLIFY=FALSE),
>         dat$ASB)
> }
> #
> # data.table-way (IMHO, much easier to read)
> fnDataTable <- function(dat) {
>     dat[,
>         result :=
>             if (.BY == "A") {
>                 2 * Flow
>             } else if (.BY == "B") {
>                 3 + Flow
>             } else if (.BY == "C") {
>                 sqrt(Flow)
>             },
>         by = ASB]
> }
> #
> # benchmark
> #
> microbenchmark(fnSplit(DF), fnDataTable(DT))
> identical(fnSplit(DF), fnDataTable(DT)[, result])
>
> ---
>
> Actually, in Chuck's version the unsplit() part is slow. If the order is not
> of concern (e.g., DF is reordered before calling fnSplit), fnSplit is
> comparable to the DT-version.
>
>
> Denes



More information about the R-help mailing list