[BioC] Odd behaviour with renameSeqlevels
    Valerie Obenchain 
    vobencha at fhcrc.org
       
    Wed May  2 19:46:25 CEST 2012
    
    
  
I'm sorry Alex, I missed your point the first time.  Yes, there was a 
bug in renameSeqlevels() wrt changing the chromosome names when the 
renaming vector was out of order with 'x'.
Now fixed in 1.8.5 release /1.9.13 devel. Thanks for reporting this.
Valerie
On 05/02/2012 09:25 AM, Valerie Obenchain wrote:
> Hi Alex,
>
> The ordering of the chromosome names displayed by seqlevels() comes 
> from the seqlinfo object in the txdb. The ordering in the txdb or the 
> txbygene before renaming is the same as after the renaming.
>
>   txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>   seqinfo(txdb)
>   seqlevels(txdb)
>
>   txbygene = transcriptsBy(txdb,"gene")
>   seqinfo(txbygene)
>   seqlevels(txbygene)
>
> This is not necessarily the same order as the seqnames (i.e., order of 
> the ranges) in the txbygene object.
>   seqnames(txbygene)
>
> Renaming the seqlevels has not changed the order of your txbygene 
> ranges if that was the concern. No, the renaming vector does not need 
> to match the ordering of the original names.
>
> Here is another way to prepare your new seqlevel names,
>
>   nms <- seqlevels(txbygene)[1:24]
>   vlu <- sub("chr", "", seqlevels(txbygene)[1:24], fixed=TRUE)
>   names(vlu) <- nms
>   renameSeqlevels(txbygene, vlu)
>
> Valerie
>
>
> On 05/02/2012 04:43 AM, Alex Gutteridge wrote:
>> Is this a bug in renameSeqlevels or expected behaviour? Note the 
>> weird ordering of chromosome names in txbygene (chrX between chr7 and 
>> chr8) which then results in misnaming when I try to use 
>> renameSeqlevels (everything after chr7 is off by one). The docs for 
>> renameSeqlevels aren't explicit in whether the renaming vector has to 
>> match the ordering of the original names, but I thought the point of 
>> making it named vector is that it doesn't?
>>
>>> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
>> Loading required package: GenomicFeatures
>> Loading required package: BiocGenerics
>>
>> Attaching package: ‘BiocGenerics’
>>
>> The following object(s) are masked from ‘package:stats’:
>>
>>     xtabs
>>
>> The following object(s) are masked from ‘package:base’:
>>
>>     anyDuplicated, cbind, colnames, duplicated, eval, Filter, Find,
>>     get, intersect, lapply, Map, mapply, mget, order, paste, pmax,
>>     pmax.int, pmin, pmin.int, Position, rbind, Reduce, rep.int,
>>     rownames, sapply, setdiff, table, tapply, union, unique
>>
>> Loading required package: IRanges
>> Loading required package: GenomicRanges
>> Loading required package: AnnotationDbi
>> Loading required package: Biobase
>> Welcome to Bioconductor
>>
>>     Vignettes contain introductory material; view with
>>     'browseVignettes()'. To cite Bioconductor, see
>>     'citation("Biobase")', and for packages 'citation("pkgname")'.
>>
>>> txdb = TxDb.Hsapiens.UCSC.hg19.knownGene
>>> txbygene = transcriptsBy(txdb,"gene")
>>> tx = 
>>> renameSeqlevels(txbygene,c("chr1"="1","chr2"="2","chr3"="3","chr4"="4",
>> +                                 
>> "chr5"="5","chr6"="6","chr7"="7","chr8"="8",
>> +                                 
>> "chr9"="9","chr10"="10","chr11"="11","chr12"="12",
>> +                                 
>> "chr13"="13","chr14"="14","chr15"="15","chr16"="16",
>> +                                 
>> "chr17"="17","chr18"="18","chr19"="19","chr20"="20",
>> +                                 "chr21"="21","chr22"="22","chrX"="X"))
>>> seqlevels(txbygene)
>>  [1] "chr1"                  "chr2"                  "chr3"
>>  [4] "chr4"                  "chr5"                  "chr6"
>>  [7] "chr7"                  "chrX"                  "chr8"
>> [10] "chr9"                  "chr10"                 "chr11"
>> [13] "chr12"                 "chr13"                 "chr14"
>> [16] "chr15"                 "chr16"                 "chr17"
>> [19] "chr18"                 "chr20"                 "chrY"
>> [22] "chr19"                 "chr22"                 "chr21"
>> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         "chr6_cox_hap2"
>> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         "chr6_qbl_hap6"
>> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       "chr4_ctg9_hap1"
>> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
>> "chr4_gl000194_random"
>> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  "chrUn_gl000222"
>> [40] "chrUn_gl000212"        "chr7_gl000195_random"  "chrUn_gl000223"
>> [43] "chrUn_gl000224"        "chrUn_gl000219"        
>> "chr17_gl000205_random"
>> [46] "chrUn_gl000215"        "chrUn_gl000216"        "chrUn_gl000217"
>> [49] "chr9_gl000199_random"  "chrUn_gl000211"        "chrUn_gl000213"
>> [52] "chrUn_gl000220"        "chrUn_gl000218"        
>> "chr19_gl000209_random"
>> [55] "chrUn_gl000221"        "chrUn_gl000214"        "chrUn_gl000228"
>> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
>> "chr19_gl000208_random"
>> [61] "chr9_gl000198_random"  "chr17_gl000204_random" "chrUn_gl000233"
>> [64] "chrUn_gl000237"        "chrUn_gl000230"        "chrUn_gl000242"
>> [67] "chrUn_gl000243"        "chrUn_gl000241"        "chrUn_gl000236"
>> [70] "chrUn_gl000240"        "chr17_gl000206_random" "chrUn_gl000232"
>> [73] "chrUn_gl000234"        "chr11_gl000202_random" "chrUn_gl000238"
>> [76] "chrUn_gl000244"        "chrUn_gl000248"        
>> "chr8_gl000196_random"
>> [79] "chrUn_gl000249"        "chrUn_gl000246"        
>> "chr17_gl000203_random"
>> [82] "chr8_gl000197_random"  "chrUn_gl000245"        "chrUn_gl000247"
>> [85] "chr9_gl000201_random"  "chrUn_gl000235"        "chrUn_gl000239"
>> [88] "chr21_gl000210_random" "chrUn_gl000231"        "chrUn_gl000229"
>> [91] "chrM"                  "chrUn_gl000226"        
>> "chr18_gl000207_random"
>>> seqlevels(tx)
>>  [1] "1"                     "2"                     "3"
>>  [4] "4"                     "5"                     "6"
>>  [7] "7"                     "8"                     "9"
>> [10] "10"                    "11"                    "12"
>> [13] "13"                    "14"                    "15"
>> [16] "16"                    "17"                    "18"
>> [19] "19"                    "20"                    "chrY"
>> [22] "21"                    "22"                    "X"
>> [25] "chr6_ssto_hap7"        "chr6_mcf_hap5"         "chr6_cox_hap2"
>> [28] "chr6_mann_hap4"        "chr6_apd_hap1"         "chr6_qbl_hap6"
>> [31] "chr6_dbb_hap3"         "chr17_ctg5_hap1"       "chr4_ctg9_hap1"
>> [34] "chr1_gl000192_random"  "chrUn_gl000225"        
>> "chr4_gl000194_random"
>> [37] "chr4_gl000193_random"  "chr9_gl000200_random"  "chrUn_gl000222"
>> [40] "chrUn_gl000212"        "chr7_gl000195_random"  "chrUn_gl000223"
>> [43] "chrUn_gl000224"        "chrUn_gl000219"        
>> "chr17_gl000205_random"
>> [46] "chrUn_gl000215"        "chrUn_gl000216"        "chrUn_gl000217"
>> [49] "chr9_gl000199_random"  "chrUn_gl000211"        "chrUn_gl000213"
>> [52] "chrUn_gl000220"        "chrUn_gl000218"        
>> "chr19_gl000209_random"
>> [55] "chrUn_gl000221"        "chrUn_gl000214"        "chrUn_gl000228"
>> [58] "chrUn_gl000227"        "chr1_gl000191_random"  
>> "chr19_gl000208_random"
>> [61] "chr9_gl000198_random"  "chr17_gl000204_random" "chrUn_gl000233"
>> [64] "chrUn_gl000237"        "chrUn_gl000230"        "chrUn_gl000242"
>> [67] "chrUn_gl000243"        "chrUn_gl000241"        "chrUn_gl000236"
>> [70] "chrUn_gl000240"        "chr17_gl000206_random" "chrUn_gl000232"
>> [73] "chrUn_gl000234"        "chr11_gl000202_random" "chrUn_gl000238"
>> [76] "chrUn_gl000244"        "chrUn_gl000248"        
>> "chr8_gl000196_random"
>> [79] "chrUn_gl000249"        "chrUn_gl000246"        
>> "chr17_gl000203_random"
>> [82] "chr8_gl000197_random"  "chrUn_gl000245"        "chrUn_gl000247"
>> [85] "chr9_gl000201_random"  "chrUn_gl000235"        "chrUn_gl000239"
>> [88] "chr21_gl000210_random" "chrUn_gl000231"        "chrUn_gl000229"
>> [91] "chrM"                  "chrUn_gl000226"        
>> "chr18_gl000207_random"
>>> txbygene$'5327'
>> GRanges with 6 ranges and 2 elementMetadata cols:
>>       seqnames               ranges strand |     tx_id     tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>>   [1]     chr8 [42032236, 42050729]      - |     31953  uc010lxf.1
>>   [2]     chr8 [42032236, 42050729]      - |     31954  uc010lxg.1
>>   [3]     chr8 [42032236, 42065194]      - |     31955  uc003xos.2
>>   [4]     chr8 [42032236, 42065194]      - |     31956  uc003xot.2
>>   [5]     chr8 [42032236, 42065194]      - |     31957  uc011lcm.1
>>   [6]     chr8 [42032236, 42065194]      - |     31958  uc011lcn.1
>>   ---
>>   seqlengths:
>>                     chr1                  chr2 ... chr18_gl000207_random
>>                249250621             243199373 ...                  4262
>>> tx$'5327'
>> GRanges with 6 ranges and 2 elementMetadata cols:
>>       seqnames               ranges strand |     tx_id     tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>>   [1]        9 [42032236, 42050729]      - |     31953  uc010lxf.1
>>   [2]        9 [42032236, 42050729]      - |     31954  uc010lxg.1
>>   [3]        9 [42032236, 42065194]      - |     31955  uc003xos.2
>>   [4]        9 [42032236, 42065194]      - |     31956  uc003xot.2
>>   [5]        9 [42032236, 42065194]      - |     31957  uc011lcm.1
>>   [6]        9 [42032236, 42065194]      - |     31958  uc011lcn.1
>>   ---
>>   seqlengths:
>>                        1                     2 ... chr18_gl000207_random
>>                249250621             243199373 ...                  4262
>>> txbygene$'1956'
>> GRanges with 11 ranges and 2 elementMetadata cols:
>>        seqnames               ranges strand |     tx_id     tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>>    [1]     chr7 [55086725, 55224644]      + |     28336  uc003tqh.3
>>    [2]     chr7 [55086725, 55236328]      + |     28337  uc003tqi.3
>>    [3]     chr7 [55086725, 55238738]      + |     28338  uc003tqj.3
>>    [4]     chr7 [55086725, 55270769]      + |     28339  uc022adm.1
>>    [5]     chr7 [55086725, 55270769]      + |     28340  uc010kzg.2
>>    [6]     chr7 [55086725, 55275031]      + |     28341  uc003tqk.3
>>    [7]     chr7 [55086725, 55275031]      + |     28342  uc022adn.1
>>    [8]     chr7 [55177540, 55275031]      + |     28343  uc011kco.2
>>    [9]     chr7 [55224226, 55238906]      + |     28345  uc011kcq.1
>>   [10]     chr7 [55224226, 55238906]      + |     28346  uc011kcp.1
>>   [11]     chr7 [55248979, 55259567]      + |     28349  uc022ado.1
>>   ---
>>   seqlengths:
>>                     chr1                  chr2 ... chr18_gl000207_random
>>                249250621             243199373 ...                  4262
>>> tx$'1956'
>> GRanges with 11 ranges and 2 elementMetadata cols:
>>        seqnames               ranges strand |     tx_id     tx_name
>> <Rle> <IRanges> <Rle> | <integer> <character>
>>    [1]        7 [55086725, 55224644]      + |     28336  uc003tqh.3
>>    [2]        7 [55086725, 55236328]      + |     28337  uc003tqi.3
>>    [3]        7 [55086725, 55238738]      + |     28338  uc003tqj.3
>>    [4]        7 [55086725, 55270769]      + |     28339  uc022adm.1
>>    [5]        7 [55086725, 55270769]      + |     28340  uc010kzg.2
>>    [6]        7 [55086725, 55275031]      + |     28341  uc003tqk.3
>>    [7]        7 [55086725, 55275031]      + |     28342  uc022adn.1
>>    [8]        7 [55177540, 55275031]      + |     28343  uc011kco.2
>>    [9]        7 [55224226, 55238906]      + |     28345  uc011kcq.1
>>   [10]        7 [55224226, 55238906]      + |     28346  uc011kcp.1
>>   [11]        7 [55248979, 55259567]      + |     28349  uc022ado.1
>>   ---
>>   seqlengths:
>>                        1                     2 ... chr18_gl000207_random
>>                249250621             243199373 ...                  
>> 4262> sessionInfo()
>> R version 2.15.0 (2012-03-30)
>> Platform: x86_64-unknown-linux-gnu (64-bit)
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>>  [7] LC_PAPER=C                 LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] TxDb.Hsapiens.UCSC.hg19.knownGene_2.7.1
>> [2] GenomicFeatures_1.8.1
>> [3] AnnotationDbi_1.18.0
>> [4] Biobase_2.16.0
>> [5] GenomicRanges_1.8.3
>> [6] IRanges_1.14.2
>> [7] BiocGenerics_0.2.0
>>
>> loaded via a namespace (and not attached):
>>  [1] biomaRt_2.12.0     Biostrings_2.24.1  bitops_1.0-4.1     
>> BSgenome_1.24.0
>>  [5] DBI_0.2-5          RCurl_1.91-1       Rsamtools_1.8.3    
>> RSQLite_0.11.1
>>  [9] rtracklayer_1.16.1 stats4_2.15.0      tools_2.15.0       XML_3.9-4
>> [13] zlibbioc_1.2.0
>>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: 
> http://news.gmane.org/gmane.science.biology.informatics.conductor
    
    
More information about the Bioconductor
mailing list