[R] Programming R to avoid loops
    Jim Mankin 
    sammankin at gmail.com
       
    Sat Apr 18 19:55:14 CEST 2015
    
    
  
Jim Mankin liked your message with Boxer. On April 18, 2015 at 10:48:17 AM MST, Charles C. Berry <ccberry at ucsd.edu> wrote:On Sat, 18 Apr 2015, Brant Inman wrote:> I have two large data frames with the following structure:>>> df1> id date test1.result> 1 a 2009-08-28 1> 2 a 2009-09-16 1> 3 b 2008-08-06 0> 4 c 2012-02-02 1> 5 c 2010-08-03 1> 6 c 2012-08-02 0>>> df2> id date test2.result> 1 a 2011-02-03 1> 2 b 2011-09-27 0> 3 b 2011-09-01 1> 4 c 2009-07-16 0> 5 c 2009-04-15 0> 6 c 2010-08-10 1>> I need to match items in df2 to those in df1 with specific matching > criteria. I have written a looped matching algorithm that works, but it > is very slow with my large datasets. I am requesting help on making a > version of this code that is faster and “vectorized" so to speak.As I see in your posted code, you match id's exactly, dates according to a range, and count the number of positive test result in the second data.frame.For this, the countOverlaps() function of the GenomicRanges package will do the trick with suitably defined GRanges objects. Something like:require(GenomicRanges)date1 date2 lagdays predays gr1 gr2  IRanges(start=date2+predays,end=date2+lagdays), strand="*")[ df2$test2.result==1,]df1$test2.count For the example data.frames (as rendered by Jim Lemon's code), this yields> df1 id date test1.result test2.count1 a 2009-08-28 1 02 a 2009-09-16 1 03 b 2008-08-06 0 04 c 2012-02-02 1 05 c 2010-08-03 1 16 c 2012-08-02 0 0The GenomicRanges package is athttp://www.bioconductor.org/packages/release/bioc/html/GenomicRanges.htmlwhere you will find installation instructions and links to vignettes.HTH,Chuck______________________________________________R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, seehttps://stat.ethz.ch/mailman/listinfo/r-helpPLEASE do read the posting guide http://www.R-project.org/posting-guide.htmland provide commented, minimal, self-contained, reproducible code.     
	[[alternative HTML version deleted]]
    
    
More information about the R-help
mailing list