[BioC] GRanges nearest problem

Thu Apr 14 15:21:38 CEST 2011

Hi Arne,

Thanks for pointing out these bugs. I'll post again here when they have 
been fixed.

Valerie

On 04/13/11 05:29, Mueller, Arne wrote:
> Hello,
>
> I've come across a problem in GRanges  nearest, if subject of the nearest call contains strand information (+/-) and the query does not (*), the method takes a long time to run and raises warnings:
>
> mm9.pro.gr and mm9.2ktiles.gr are both Granges objects.
>
>    
>> strand(mm9.pro.gr) = "-"
>> strand(mm9.2ktiles.gr) = "*"
>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>      
>     user  system elapsed
>   27.150   0.002  27.416
> There were 50 or more warnings (use warnings() to see the first 50)
>    
>> warnings()
>>      
> Warning messages:
> 1: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>    longer object length is not a multiple of shorter object length
> 2: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>    longer object length is not a multiple of shorter object length
> 3: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>    longer object length is not a multiple of shorter object length
> 4: In start(ranges(x1Split[[st]])) - end(subSplit2) :
>    longer object length is not a multiple of shorter object length
> …
>
> I think if a range in either query or subject is non-stranded (*) both, the method should look for the nearest neighbor ignoring the strand (at least that's my suggestion ;-).
>
> If I set the strand info of the subject to "*" the method runs fine:
>
>    
>> strand(mm9.pro.gr) = "*"
>> system.time(nn<- nearest(mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>      
>     user  system elapsed
>    0.264   0.000   0.264
>
> If the query is "stranded" (+/-) and the subject isn't, the method runs fine, too (though longer as if both query and subject are non-stranded, but I guess this can be expected):
>
>    
>>   system.time(nn<- nearest(mm9.pro.gr[1:5000], mm9.2ktiles.gr[1:5000], mm9.pro.gr[1:5000]))
>>      
>     user  system elapsed
>    3.084   0.000   3.125
>
> Another odd behavior is that if the query contains sequence names not contained in the subject an error is raised – the other way around works fine. Wouldn't it make sense so set the vector elements of sequences only found in the query to NA?
>
>      Kind regards,
>
>      Arne
>
>
>
>
> 	[[alternative HTML version deleted]]
>
>    
>
>
> _______________________________________________
> Bioconductor mailing list
> Bioconductor at r-project.org
> https://stat.ethz.ch/mailman/listinfo/bioconductor
> Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor