[BioC] IRanges/List oddity: do.call of `c` on a list of IRangesList returns "list" only when the list is named

Hervé Pagès hpages at fhcrc.org
Thu Dec 13 03:46:40 CET 2012


Hi Malcolm,

I'm not sure what the reasons are for the current behaviour
of the c() generic, if they're just historical, or if there
is something deeper, or...

My view on the "primitive" status of a function is that it should
be an implementation detail, maybe an important one, but a
detail anyway in the sense that being implemented as a .Primitive
or an .Internal or just in plain R should not affect the semantic
of a function. Interestingly there is a short comment in ?.Primitive
suggesting that people's code should not depend on knowing which
functions are primitive because this does change as R evolves.
Unfortunately the reality is very different: there are situations
where you definitely need to know that something is a primitive,
just because argument passing (and consequently method dispatch)
works differently.

On a more positive note, I found a hack that allows c() to dispatch
on ...:

   setGeneric("c", signature="...",
     function(..., recursive=FALSE)
         standardGeneric("c"),
     useAsDefault=function(..., recursive=FALSE)
                      base::c(..., recursive=recursive)
   )

Then:

   setClass("A", representation(aa="integer"))

   setMethod("c", "A",
     function(..., recursive=FALSE)
     {
         args <- list(...)
         ans_aa <- unlist(lapply(args, slot, "aa"), use.names=FALSE)
         new("A", aa=ans_aa)
     }
   )

   > a1 <- new("A", aa=1:3)
   > a2 <- new("A", aa=22:25)

   > c(a1, a2)
   An object of class "A"
   Slot "aa":
   [1]  1  2  3 22 23 24 25

   > c(a1, x=a2)
   An object of class "A"
   Slot "aa":
   [1]  1  2  3 22 23 24 25

   > c(A=a1, B=a2)
   An object of class "A"
   Slot "aa":
   [1]  1  2  3 22 23 24 25

Overriding base::c() with our own c() is pretty invasive though and
I didn't test it enough to guarantee that it doesn't break or slowdown
things.

Also one important thing to note is that this signature doesn't
allow specific methods to implement extra arguments (like the "c"
method for GenomicRanges does), which kind of makes sense because
the generic function is putting named args that are not named
'recursive' in ..., and dispatches on them. The same restriction
applies to the cbind() and rbind() generics:

   > setMethod("cbind", "A", function(..., deparse.level=1, 
my.toggle=FALSE) NULL)
   Creating a generic function for ‘cbind’ from package ‘base’ in the 
global environment
   in method for ‘cbind’ with signature ‘"A"’: no definition for class “A”
   Error in rematchDefinition(definition, fdef, mnames, fnames, 
signature) :
     arguments (deparse.level) after '...' in the generic must appear in 
the method, in the same place at the end of the argument list

So some of the "c" methods would need to be revisited.

Anyway, would need serious testing before adding this generic to
BiocGenerics. Is it worth it?

Cheers,
H.


On 12/03/2012 12:11 PM, Cook, Malcolm wrote:
> Steve, Michael, Herve, all
>
> As always, “illuminating”.
>
> And, as often, frustrating.
>
> I am clear how unname serves as a workaround for my current purpose.
> So, I can proceed.
>
> But, I remain unclear if this (to me, odd) behavior of `base::c` is
> desirable or justifiable in any sense of the word.  Is this informed by
> a rational language design, or, as Mike suggests, the result of layering
> on of OO design onto a functional base.
>
> In your opinion, do you/we think this issue should this issue be raised
> on R-devel?  Or is it a “waste of time”?
>
> Thanks for your thoughts/help.
>
> ~Malcolm
>
> *From:*Michael Lawrence [mailto:lawrence.michael at gene.com]
> *Sent:* Monday, December 03, 2012 11:31 AM
> *To:* Hervé Pagès
> *Cc:* Cook, Malcolm; bioconductor at r-project.org
> *Subject:* Re: [BioC] IRanges/List oddity: do.call of `c` on a list of
> IRangesList returns "list" only when the list is named
>
> On Fri, Nov 30, 2012 at 3:28 PM, Hervé Pagès <hpages at fhcrc.org
> <mailto:hpages at fhcrc.org>> wrote:
>
> Hi Malcolm,
>
> The problem you are describing can be reproduced by calling c()
> directly on S4 objects.
>
>    * With unnamed arguments:
>
>      > c(IRanges(), IRanges())
>      IRanges of length 0
>
>      > c(Rle(), Rle())
>      logical-Rle of length 0 with 0 runs
>        Lengths:
>        Values :
>
>    * With named arguments:
>
>      > c(a=IRanges(),b=IRanges())
>      $a
>      IRanges of length 0
>
>      $b
>      IRanges of length 0
>
>      > c(a=Rle(), b=Rle())
>      $a
>      logical-Rle of length 0 with 0 runs
>        Lengths:
>        Values :
>
>      $b
>      logical-Rle of length 0 with 0 runs
>        Lengths:
>        Values :
>
> This statement (found in man page for base::c()) is showing what the
> root of the problem is:
>
>    S4 methods:
>
>       This function is S4 generic, but with argument list ‘(x, ...,
>       recursive = FALSE)’.
>
> Note that, to make things a little bit more confusing, it's not totally
> accurate that c() is an S4 generic, at least not on a fresh session:
>
>    > isGeneric("c")
>    [1] FALSE
>
> So my understanding of the above statement is that c() will
> automatically be turned into an S4 generic at the moment you try
> to define an S4 method for it, and, for obscure reasons that I'm not
> sure I understand, the argument list used in the definition of this
> S4 method must start with 'x'. The consequence of all this is that
> dispatch will happen on 'x' so if named arguments are passed with
> a name that is not 'x', dispatch will fail and the default method
> (which is base::c()) will be called :-b
>
> This explains why things work as expected in the following situations:
>
>    > c(IRanges(), b=IRanges())
>    IRanges of length 0
>
>    > c(a=IRanges(), IRanges())
>    IRanges of length 0
>
>    > c(a=IRanges(), x=IRanges())
>    IRanges of length 0
>
> But when all the arguments are named with names != 'x', then nothing
> is passed to 'x' and dispatch fails.
>
> I didn't have much luck so far with my attempts to work around this:
>
>    1. Trying to change the signature of the c() generic:
>
>       > setGeneric("c", signature="...")
>       Error in setGeneric("c", signature = "...") :
>         ‘c’ is a primitive function;  methods can be defined, but
>        the generic function is implicit, and cannot be changed.
>
>    2. Trying to dispatch on "missing" or "ANY":
>
>       > setMethod("c", "missing", function(x, ..., recursive=FALSE) "YES!")
>       Error in setMethod("c", "missing", function(x, ..., recursive =
> FALSE) "YES!") :
>         the method for function ‘c’ and signature x="missing" is sealed
> and cannot be re-defined
>
>       > setMethod("c", "ANY", function(x, ..., recursive=FALSE) "YES!")
> Error in setMethod("c", "ANY", function(x, ..., recursive = FALSE) "YES!") :
>         the method for function ‘c’ and signature x="ANY" is sealed and
> cannot be re-defined
>
> With old versions of R dispatch on ... was not possible i.e. ... was not
> allowed to be in the signature of the generic. This was changed in
> recent versions of R and we're already using this new feature for a
> few S4 generics defined in BiocGenerics e.g. for cbind() and rbind():
>
>    > library(BiocGenerics)
>    > rbind
>    standardGeneric for "rbind" defined from package "BiocGenerics"
>
>    function (..., deparse.level = 1)
>    standardGeneric("rbind")
>    <environment: 0x29b96b0>
>    Methods may be defined for arguments: ...
>    Use  showMethods("rbind")  for currently available ones.
>
> And dispatch works as expected, with or without named arguments:
>
>    > rbind(a=DataFrame(X=1:3, Y=11:13), b=DataFrame(X=1:3, Y=21:23))
>    DataFrame with 6 rows and 2 columns
>              X         Y
>      <integer> <integer>
>    1         1        11
>    2         2        12
>    3         3        13
>    4         1        21
>    5         2        22
>    6         3        23
>
>    > rbind(DataFrame(X=1:3, Y=11:13), DataFrame(X=1:3, Y=21:23))
>    DataFrame with 6 rows and 2 columns
>              X         Y
>      <integer> <integer>
>    1         1        11
>    2         2        12
>    3         3        13
>    4         1        21
>    5         2        22
>    6         3        23
>
> So I wonder if the weird behavior of c() is still justified.
>
> Comments/suggestions to address this are welcome.
>
>
>
> The issue is that (unlike 'rbind')  'c' is a primitive and dispatch for
> primitives is hard-coded in C. C-level dispatch is a simplified variant
> of the R implementation, so I'm guessing it does not work with "...".
>
> Btw, you can get a peak at the 'c' generic with:
>  > getGeneric("c")
> standardGeneric for "c" defined from package "base"
>
> function (x, ..., recursive = FALSE)
> standardGeneric("c", .Primitive("c"))
> <bytecode: 0x382af20>
> <environment: 0x34d6878>
> Methods may be defined for arguments: x, recursive
> Use  showMethods("c")  for currently available ones.
>
> Michael
>
>     Thanks,
>     H.
>
>
>
>
>     On 11/30/2012 11:56 AM, Cook, Malcolm wrote:
>
>     Hi,
>
>     The following shows that do.call of `c` on a list of IRangesList
>     returns "list" only when the list is named.
>
>     library(IRanges)
>     example(IRangesList)
>     class(x)
>
>     [1] "CompressedIRangesList"
>     attr(,"package")
>     [1] "IRanges"
>
>     class(do.call(c,list(x1=x,x2=x)))
>
>     [1] "list"
>
>     I am confused this.
>
>     I would not expect the fact that the list is named to have any
>     impact on the result.
>
>     But, look, omitting the list names the class is now an IRangesList
>
>     class(do.call(c,list(x,x)))
>
>     [1] "CompressedIRangesList"
>     attr(,"package")
>     [1] "IRanges"
>
>     class(c(x,x))
>
>     [1] "CompressedIRangesList"
>     attr(,"package")
>     [1] "IRanges"
>
>     A 'workaround' is to unname the list, as demonstrated:
>
>     class(do.call(c,unname(list(x1=x,x2=x))))
>
>     [1] "CompressedIRangesList"
>     attr(,"package")
>     [1] "IRanges"
>
>     But, why does having a 'names' attribute effect the behavior of
>     do.calling `c` so much as to change the class returned?
>
>
>     Thanks for your help/education.....
>
>     Malcolm Cook
>     Computational Biology - Stowers Institute for Medical Research
>
>     sessionInfo()
>
>     R version 2.15.1 (2012-06-22)
>     Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>
>     locale:
>     [1] C
>
>     attached base packages:
>     [1] stats     graphics  grDevices utils     datasets  methods   base
>
>     other attached packages:
>     [1] IRanges_1.16.4     BiocGenerics_0.4.0
>
>     loaded via a namespace (and not attached):
>        [1] AnnotationDbi_1.20.3   BSgenome_1.26.1        Biobase_2.18.0
>              Biostrings_2.26.2      DBI_0.2-5
>       GenomicFeatures_1.10.1 GenomicRanges_1.10.5   RCurl_1.95-3
>        RSQLite_0.11.2         Rsamtools_1.10.2       XML_3.95-0.1
>          biomaRt_2.14.0         bitops_1.0-4.2         colorspace_1.2-0
>            data.table_1.8.6       functional_0.1         graph_1.36.1
>              gtools_2.7.0           parallel_2.15.1
>       rtracklayer_1.18.1     stats4_2.15.1          tools_2.15.1
>        zlibbioc_1.4.0
>
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>
>     --
>     Hervé Pagès
>
>     Program in Computational Biology
>     Division of Public Health Sciences
>     Fred Hutchinson Cancer Research Center
>     1100 Fairview Ave. N, M1-B514
>     P.O. Box 19024
>     Seattle, WA 98109-1024
>
>     E-mail: hpages at fhcrc.org <mailto:hpages at fhcrc.org>
>     Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
>     Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
>
>
>
>     _______________________________________________
>     Bioconductor mailing list
>     Bioconductor at r-project.org <mailto:Bioconductor at r-project.org>
>     https://stat.ethz.ch/mailman/listinfo/bioconductor
>     Search the archives:
>     http://news.gmane.org/gmane.science.biology.informatics.conductor
>

-- 
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpages at fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319



More information about the Bioconductor mailing list