[BioC] BUG in Genomic(Features|Ranges): names(unlist(transcriptsBy(txdb, 'gene'))) is UNRELIABLE!!!

Cook, Malcolm MEC at stowers.org
Tue Sep 4 18:01:31 CEST 2012


Martin,

Ah, so sorry.  The BUG is in my understanding of base R.

Thanks for your time to fix my thinking.

Gracias,

Malcolm

On 9/1/12 8:35 AM, "Martin Morgan" <mtmorgan at fhcrc.org> wrote:

>On 08/31/2012 10:07 PM, Cook, Malcolm wrote:
>> Careful fellow travelers,
>>
>> I find that unlisting the GenomicRanges returned from a call to
>>`transcriptsBy` returns a list with names that are gene names... only
>>they are incorrect!
>>
>> Look:
>>
>>> txdb<-makeTranscriptDbFromBiomart(biomart="ensembl",
>>>dataset="dmelanogaster_gene_ensembl")
>> ...
>>> transcriptsBy(txdb,'gene')[2]
>> GRangesList of length 1:
>> $FBgn0000008
>> GRanges with 3 ranges and 2 elementMetadata cols:
>>        seqnames               ranges strand |     tx_id     tx_name
>>           <Rle>            <IRanges>  <Rle> | <integer> <character>
>>    [1]       2R [18024494, 18060339]      + |      8616 FBtr0100521
>>    [2]       2R [18024496, 18060346]      + |      8615 FBtr0071763
>>    [3]       2R [18024938, 18060346]      + |      8617 FBtr0071764
>> ...
>>> unlist(transcriptsBy(txdb,'gene')[2])
>> GRanges with 3 ranges and 2 elementMetadata cols:
>>                 seqnames               ranges strand |     tx_id
>>tx_name
>>                    <Rle>            <IRanges>  <Rle> | <integer>
>><character>
>>     FBgn0000008       2R [18024494, 18060339]      + |      8616
>>FBtr0100521
>>    FBgn00000081       2R [18024496, 18060346]      + |      8615
>>FBtr0071763
>>    FBgn00000082       2R [18024938, 18060346]      + |      8617
>>FBtr0071764
>> ...
>>
>>
>> Arguably, those names on the the GRanges should either all be the same,
>>namely FBgn0000008, or they should not be returned.
>
>This is the way unlist works in base R
>
> > unlist(list(a=1:2))
>a1 a2
>  1  2
>
>but the behavior has been changed in devel (to be release in early
>October)
>
> > unlist(GRangesList(A=GRanges("a", IRanges(1:2, 10))))
>GRanges with 2 ranges and 0 metadata columns:
>     seqnames    ranges strand
>        <Rle> <IRanges>  <Rle>
>   A        a   [1, 10]      *
>   A        a   [2, 10]      *
>   ---
>   seqlengths:
>     a
>    NA
>
>the work-around, as in base R, is to add use.names=FALSE to unlist
>(perhaps adding a metadata column of rep(names(txdb),
>elementLengths(txdb))).
>
>> This 'bug' has been around for a some time.  I meant to report it, and
>>just tripped over it again.
>>
>> Can fix?
>>
>> Thanks!
>>
>> Malcolm
>>
>>> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)
>>
>> locale:
>> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
>>
>> attached base packages:
>>   [1] tools     splines   parallel  stats     graphics  grDevices utils
>>    datasets  methods   base
>>
>> other attached packages:
>>   [1] igraph_0.6-2          log4r_0.1-4           vwr_0.1
>>RecordLinkage_0.4-1   ffbase_0.5            ff_2.2-7
>>bit_1.1-8             evd_2.2-7             ipred_0.8-13
>>prodlim_1.3.1         KernSmooth_2.23-8     nnet_7.3-4
>>survival_2.36-14      mlbench_2.1-1         MASS_7.3-20
>>ada_2.0-3             rpart_3.1-54          e1071_1.6
>>class_7.3-4           XLConnect_0.2-0       XLConnectJars_0.2-0
>>rJava_0.9-3           latticeExtra_0.6-19   RColorBrewer_1.0-5
>>lattice_0.20-6        doMC_1.2.5            multicore_0.1-7
>> [28] SRAdb_1.10.0          RCurl_1.91-1          bitops_1.0-4.1
>>graph_1.34.0          BSgenome_1.24.0       rtracklayer_1.16.3
>>Rsamtools_1.8.6       Biostrings_2.24.1     GenomicFeatures_1.8.2
>>AnnotationDbi_1.19.31 GenomicRanges_1.8.12  R.utils_1.16.0
>>R.oo_1.9.8            R.methodsS3_1.4.2     IRanges_1.14.4
>>Biobase_2.17.7        BiocGenerics_0.3.1    data.table_1.8.2
>>compare_0.2-3         svUnit_0.7-10         doParallel_1.0.1
>>iterators_1.0.6       foreach_1.4.0         ggplot2_0.9.1
>>sqldf_0.4-6.4         RSQLite.extfuns_0.0.1 RSQLite_0.11.1
>> [55] chron_2.3-42          gsubfn_0.6-4          proto_0.3-9.2
>>DBI_0.2-5             functional_0.1        reshape_0.8.4
>>plyr_1.7.1            stringr_0.6.1         gtools_2.7.0
>>
>> loaded via a namespace (and not attached):
>>   [1] biomaRt_2.12.0   codetools_0.2-8  colorspace_1.1-1
>>compiler_2.15.0  dichromat_1.2-4  digest_0.5.2     GEOquery_2.23.5
>>grid_2.15.0      labeling_0.1     memoise_0.1      munsell_0.3
>>reshape2_1.2.1   scales_0.2.1     stats4_2.15.0    tcltk_2.15.0
>>XML_3.9-4        zlibbioc_1.2.0
>>>
>
>
>-- 
>Computational Biology / Fred Hutchinson Cancer Research Center
>1100 Fairview Ave. N.
>PO Box 19024 Seattle, WA 98109
>
>Location: Arnold Building M1 B861
>Phone: (206) 667-2793



More information about the Bioconductor mailing list