[BioC] GRanges nearest problem
Valerie Obenchain
vobencha at fhcrc.org
Thu Apr 14 15:21:38 CEST 2011
Hi Arne,
Thanks for pointing out these bugs. I'll post again here when they have
been fixed.
Valerie
On 04/13/11 05:29, Mueller, Arne wrote:
> Hello,
>
> I've come across a problem in GRanges nearest, if subject of the nearest call contains strand information (+/-) and the query does not (*), the method takes a long time to run and raises warnings:
>
> mm9.pro.gr and mm9.2ktiles.gr are both Granges objects.
>
>
>> strand(mm9.pro.gr) = "-"
>> strand(mm9.2ktiles.gr) = "*"
>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>
> user system elapsed
> 27.150 0.002 27.416
> There were 50 or more warnings (use warnings() to see the first 50)
>
>> warnings()
>>
> Warning messages:
> 1: In start(ranges(x1Split[[st]])) - end(subSplit2) :
> longer object length is not a multiple of shorter object length
> 2: In start(ranges(x1Split[[st]])) - end(subSplit2) :
> longer object length is not a multiple of shorter object length
> 3: In start(ranges(x1Split[[st]])) - end(subSplit2) :
> longer object length is not a multiple of shorter object length
> 4: In start(ranges(x1Split[[st]])) - end(subSplit2) :
> longer object length is not a multiple of shorter object length
> …
>
> I think if a range in either query or subject is non-stranded (*) both, the method should look for the nearest neighbor ignoring the strand (at least that's my suggestion ;-).
>
> If I set the strand info of the subject to "*" the method runs fine:
>
>
>> strand(mm9.pro.gr) = "*"
>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>
> user system elapsed
> 0.264 0.000 0.264
>
> If the query is "stranded" (+/-) and the subject isn't, the method runs fine, too (though longer as if both query and subject are non-stranded, but I guess this can be expected):
>
>
>> system.time(nn<- nearest(mm9.pro.gr[1:5000], mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>
> user system elapsed
> 3.084 0.000 3.125
>
> Another odd behavior is that if the query contains sequence names not contained in the subject an error is raised – the other way around works fine. Wouldn't it make sense so set the vector elements of sequences only found in the query to NA?
>
> Kind regards,
>
> Arne
>
>
>
>
> [[alternative HTML version deleted]]
>
>
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor
More information about the Bioconductor
mailing list