[BioC] Problem with nearest() in GenomicRanges
Valerie Obenchain
vobencha at fhcrc.org
Tue Jul 31 04:08:04 CEST 2012
Hi Chris,
It's difficult to know what's going on without a reproducible example.
Can you attach a small subset of your GRanges that still triggers the
error? If not, is there a place I can get the data for testing?
Valerie
On 07/28/12 12:03, Chris Whelan wrote:
> Hi all,
>
> I'm having an issue with the nearest() method in GenomicRanges. I've
> created GRanges from two data frames, from data for a species genome
> assembly with lots of scaffolds:
>
>> xs<- with(x.data, GRanges(seqnames=Scaffold, ranges=IRanges(start=Start, end=Stop), strand=Strand))
>> xs
> GRanges with 989 ranges and 0 elementMetadata cols:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] GL397480 [ 1477346, 1477813] -
> [2] GL397446 [ 869823, 870295] +
> ... ... ... ...
> [988] GL397464 [1138154, 1138427] -
> [989] GL397464 [1138154, 1138427] -
> ---
> seqlengths:
> ADFV01135143 ADFV01136404 GL397262 ... GL397832 GL397836
> NA NA NA ... NA NA
>
>> cpg.islands<- with(cpg.island.data, GRanges(seqnames=V1, ranges=IRanges(start=V4, end=V5), strand='*'))
>> cpg.islands
> GRanges with 88968 ranges and 0 elementMetadata cols:
> seqnames ranges strand
> <Rle> <IRanges> <Rle>
> [1] GL397261 [ 4409, 5062] *
> [2] GL397261 [ 66765, 67010] *
> ... ... ... ...
> [88967] ADFV01162706 [1834, 2052] *
> [88968] ADFV01162751 [ 226, 884] *
> ---
> seqlengths:
> ADFV01134425 ADFV01134436 ADFV01134442 ... GL405215 GL405216
> NA NA NA ... NA NA
>
> And I'd like to use nearest to find the closest island to each x:
>
>> nearest(xs, cpg.islands)
> Error in Rle(values = callGeneric(runValue(e1)[which1],
> runValue(e2)[which2]), :
> error in evaluating the argument 'values' in selecting a method for
> function 'Rle': Error in Ops.factor(runValue(e1)[which1],
> runValue(e2)[which2]) :
> level sets of factors are different
>> traceback()
> 12: Rle(values = callGeneric(runValue(e1)[which1], runValue(e2)[which2]),
> lengths = diffWithInitialZero(ends), check = FALSE)
> 11: seqnames(x) != seqnames(y)
> 10: seqnames(x) != seqnames(y)
> 9: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
> 8: `seqselect<-`(`*tmp*`, seqnames(x) != seqnames(y), value = NA)
> 7: .local(x, y, ...)
> 6: distance(x[midx], subject[p[midx]])
> 5: distance(x[midx], subject[p[midx]])
> 4: .nearest(x, subject, ignore.strand, ...)
> 3: .local(x, subject, ...)
> 2: nearest(xs, cpg.islands)
> 1: nearest(xs, cpg.islands)
>
> This same data works fine with other methods like findOverlaps():
>
>> findOverlaps(xs, cpg.islands)
> Hits of length 225
> queryLength: 989
> subjectLength: 88968
> queryHits subjectHits
> <integer> <integer>
> 1 2 75585
> 2 6 36857
> ... ... ...
> 224 963 84872
> 225 964 84872
>
> Any idea what I might be doing wrong? Here's my sessionInfo():
>
>> sessionInfo()
> R version 2.15.1 (2012-06-22)
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=C LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
> [5] LC_MONETARY=C LC_MESSAGES=C
> [7] LC_PAPER=C LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] locfit_1.5-8 GenomicFeatures_1.8.2 AnnotationDbi_1.18.1
> [4] Biobase_2.16.0 multicore_0.1-7 rtracklayer_1.16.3
> [7] GenomicRanges_1.8.9 IRanges_1.14.4 BiocGenerics_0.2.0
> [10] reshape2_1.2.1 ggplot2_0.9.1 BiocInstaller_1.4.7
>
> loaded via a namespace (and not attached):
> [1] BSgenome_1.24.0 Biostrings_2.24.1 DBI_0.2-5 MASS_7.3-19
> [5] RColorBrewer_1.0-5 RCurl_1.91-1 RSQLite_0.11.1 Rsamtools_1.8.5
> [9] XML_3.9-4 biomaRt_2.12.0 bitops_1.0-4.1 colorspace_1.1-1
> [13] compiler_2.15.1 dichromat_1.2-4 digest_0.5.2 grid_2.15.1
> [17] labeling_0.1 lattice_0.20-6 memoise_0.1 munsell_0.3
> [21] plyr_1.7.1 proto_0.3-9.2 scales_0.2.1 stats4_2.15.1
> [25] stringr_0.6.1 tools_2.15.1 zlibbioc_1.2.0
>
> Thanks in advance,
>
> Chris
>
> [[alternative HTML version deleted]]
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list