[R] two difficult loop

Tue Jun 14 09:39:52 CEST 2016

Hi Greg,
This is obviously a problem with the data. The first error indicates
that the sequence of integers regrange[1]:regrange[2] is too long to
be allocated. Most likely this is because one or both of the endpoints
are infinite. Maybe if you can find where the NAs are you can fix it.

Jim

On Tue, Jun 14, 2016 at 12:29 AM, greg holly <mak.hholly at gmail.com> wrote:
> Hi Jim;
>
> I do apologize if bothering. I have run on the real data and here is the
> error message I got:
>
> Thanks,
>
> Greg
>
> start  end
> Error in regrange[1]:regrange[2] : result would be too long a vector
> In addition: Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
>
> On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hholly at gmail.com> wrote:
>>
>> Hi Jim;
>>
>> I do apologize if bothering. I have run on the real data and here is the
>> error message I got:
>>
>> Thanks,
>>
>> Greg
>>
>> start  end
>> Error in regrange[1]:regrange[2] : result would be too long a vector
>> In addition: Warning messages:
>> 1: In min(x) : no non-missing arguments to min; returning Inf
>> 2: In max(x) : no non-missing arguments to max; returning -Inf
>>
>>
>> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>>>
>>> Hi Greg,
>>> Okay, I have a better idea now of what you want. The problem of
>>> multiple matches is still there, but here is a start:
>>>
>>> # this data frame actually contains all the values in ref in the "reg"
>>> field
>>> map<-read.table(text="reg p rate
>>>  10276 0.700  3.867e-18
>>>  71608 0.830  4.542e-16
>>>  29220 0.430  1.948e-15
>>>  99542 0.220  1.084e-15
>>>  26441 0.880  9.675e-14
>>>  95082 0.090  7.349e-13
>>>  36169 0.480  9.715e-13
>>>  55572 0.500  9.071e-12
>>>  65255 0.300  1.688e-11
>>>  51960 0.970  1.163e-10
>>>  55652 0.388  3.750e-10
>>>  63933 0.250  9.128e-10
>>>  35170 0.720  7.355e-09
>>>  06491 0.370  1.634e-08
>>>  85508 0.470  1.057e-07
>>>  86666 0.580  7.862e-07
>>>  04758 0.810  9.501e-07
>>>  06169 0.440  1.104e-06
>>>  63933 0.750  2.624e-06
>>>  41838 0.960  8.119e-06
>>>  74806 0.810  9.501e-07
>>>  92643 0.470  1.057e-07
>>>  73732 0.090  7.349e-13
>>>  82451 0.960  8.119e-06
>>>  86042 0.480  9.715e-13
>>>  93502 0.500  9.071e-12
>>>  85508 0.370  1.634e-08
>>>  95082 0.830  4.542e-16",
>>>  header=TRUE)
>>> # same as in your example
>>> ref<-read.table(text="reg1 reg2
>>>  29220     63933
>>>  26441     41838
>>>  06169     10276
>>>  74806     92643
>>>  73732     82451
>>>  86042     93502
>>>  85508     95082",
>>>  header=TRUE)
>>> # sort the "map" data frame
>>> map2<-map[order(map$reg),]
>>> # get a field for the counts
>>> ref$n<-NA
>>> # and a field for the minimum p values
>>> ref$min_p<-NA
>>> # get the number of rows in "ref"
>>> nref<-dim(ref)[1]
>>> for(i in 1:nref) {
>>>  start<-which(map2$reg==ref$reg1[i])
>>>  end<-which(map2$reg==ref$reg2[i])
>>>  cat("start",start,"end",end,"\n")
>>>  # get the range of matches
>>>  regrange<-range(c(start,end))
>>>  # convert this to a sequence spanning all matches
>>>  allreg<-regrange[1]:regrange[2]
>>>  ref$n[i]<-sum(map2$p[allreg] > 0.85)
>>>  ref$min_p[i]<-min(map2$p[allreg])
>>> }
>>>
>>> This example uses the span from the first match of "reg1" to the last
>>> match of "reg2". This may not be what you want, so let me know if
>>> there are further constraints.
>>>
>>> Jim
>>>
>>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com>
>>> wrote:
>>> > Hi Bert;
>>> >
>>> > I do appreciate for this. I need check your codes on task2 tomorrow at
>>> > my
>>> > office on the real data as I have difficulty (because a technical
>>> > issue) to
>>> > remote connection. I am sure it will work well.
>>> >
>>> > I am sorry that I was not able to explain my first question. Basically
>>> >
>>> > Values in ref data represent the region of chromosome. I need choose
>>> > these
>>> > regions in map (all regions values in ref data are exist in map data in
>>> > the
>>> > first column -column map$reg). And then summing up the column "map$rate
>>> > and
>>> > count the numbers that gives >0.85. For example, consider  the first
>>> > row in
>>> > data ref. They are 29220   and  63933. After sorting the first column
>>> > in
>>> > map then summing column "map$rate" only between 29220   to  63933 in
>>> > sorted
>>> > map and cut off at >0.85. Then count how many rows in sorted map gives
>>> >>0.85. For example consider there are 38 rows between 29220   in  63933
>>> >> in sorted
>>> > map$reg and only summing first 12 of them  gives>0.85. Then my answer
>>> > is
>>> > going to be 12 for 29220   -  63933 in ref.
>>> >
>>> > Thanks I lot for your patience.
>>> >
>>> > Cheers,
>>> > Greg
>>> >
>>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com>
>>> > wrote:
>>> >
>>> >> Hi Bert;
>>> >>
>>> >> I do appreciate for this. I need check your codes on task2 tomorrow at
>>> >> my
>>> >> office on the real data as I have difficulty (because a technical
>>> >> issue) to
>>> >> remote connection. I am sure it will work well.
>>> >>
>>> >> I am sorry that I was not able to explain my first question. Basically
>>> >>
>>> >> Values in ref data represent the region of chromosome. I need choose
>>> >> these
>>> >> regions in map (all regions values in ref data are exist in map data
>>> >> in the
>>> >> first column -column map$reg). And then summing up the column
>>> >> "map$rate and
>>> >> count the numbers that gives >0.85. For example, consider  the first
>>> >> row in
>>> >> data ref. They are 29220   and  63933. After sorting the first column
>>> >> in
>>> >> map then summing column "map$rate" only between 29220   to  63933 in
>>> >> sorted map and cut off at >0.85. Then count how many rows in sorted
>>> >> map
>>> >> gives >0.85. For example consider there are 38 rows between 29220   in
>>> >>  63933 in sorted map$reg and only summing first 12 of them
>>> >> gives>0.85.
>>> >> Then my answer is going to be 12 for 29220   -  63933 in ref.
>>> >>
>>> >> Thanks I lot for your patience.
>>> >>
>>> >> Cheers,
>>> >> Greg
>>> >>
>>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4567 at gmail.com>
>>> >> wrote:
>>> >>
>>> >>> Greg:
>>> >>>
>>> >>> I was not able to understand your task 1. Perhaps others can.
>>> >>>
>>> >>> My understanding of your task 2 is that for each row of ref, you wish
>>> >>> to find all rows,of map such that the reg values in those rows fall
>>> >>> between the reg1 and reg2 values in ref (inclusive change <= to < if
>>> >>> you don't want the endpoints), and then you want the minimum map$p
>>> >>> values of all those rows. If that is correct, I believe this will do
>>> >>> it (but caution, untested, as you failed to provide data in a
>>> >>> convenient form, e.g. using dput() )
>>> >>>
>>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
>>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
>>> >>>
>>> >>>
>>> >>> If my understanding is incorrect, please ignore both the above and
>>> >>> the
>>> >>> following:
>>> >>>
>>> >>>
>>> >>> The "solution" I have given above seems inefficient, so others may be
>>> >>> able to significantly improve it if you find that it takes too long.
>>> >>> OTOH, my understanding of your specification is that you need to
>>> >>> search for all rows in map data frame that meet the criterion for
>>> >>> each
>>> >>> row of ref, and without further information, I don't know how to do
>>> >>> this without just repeating the search 560 times.
>>> >>>
>>> >>>
>>> >>> Cheers,
>>> >>> Bert
>>> >>>
>>> >>>
>>> >>> Bert Gunter
>>> >>>
>>> >>> "The trouble with having an open mind is that people keep coming
>>> >>> along
>>> >>> and sticking things into it."
>>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>> >>>
>>> >>>
>>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com>
>>> >>> wrote:
>>> >>> > Dear all;
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > I have two data sets, data=map and data=ref). A small part of each
>>> >>> > data
>>> >>> set
>>> >>> > are given below. Data map has more than 27 million and data ref has
>>> >>> about
>>> >>> > 560 rows. Basically I need run two different task. My R codes for
>>> >>> > these
>>> >>> > task are given below but they do not work properly.
>>> >>> >
>>> >>> > I sincerely do appreciate your helps.
>>> >>> >
>>> >>> >
>>> >>> > Regards,
>>> >>> >
>>> >>> > Greg
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > Task 1)
>>> >>> >
>>> >>> > For example, the first and second columns for row 1 in data ref are
>>> >>> 29220
>>> >>> > 63933. So I need write an R code normally first look the first row
>>> >>> > in
>>> >>> ref
>>> >>> > (which they are 29220 and 63933) than summing the column of
>>> >>> > "map$rate"
>>> >>> and
>>> >>> > give the number of rows that >0.85. Then do the same for the
>>> >>> > second,
>>> >>> > third....in ref. At the end I would like a table gave below (the
>>> >>> results I
>>> >>> > need). Please notice the all value specified in ref data file are
>>> >>> > exist
>>> >>> in
>>> >>> > map$reg column.
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > Task2)
>>> >>> >
>>> >>> > Again example, the first and second columns for row 1 in data ref
>>> >>> > are
>>> >>> 29220
>>> >>> > 63933. So I need write an R code give the minimum map$p for the
>>> >>> > 29220
>>> >>> > -63933 intervals in map file. Than
>>> >>> >
>>> >>> > do the same for the second, third....in ref.
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > #my attempt for the first question
>>> >>> >
>>> >>> > temp<-map[order(map$reg, map$p),]
>>> >>> >
>>> >>> > count<-1
>>> >>> >
>>> >>> > temp<-unique(temp$reg
>>> >>> >
>>> >>> > for(i in 1:length(ref) {
>>> >>> >
>>> >>> >   for(j in 1:length(ref)
>>> >>> >
>>> >>> >   {
>>> >>> >
>>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] &
>>> >>> > (temp[pos[j]==ref[ref$reg2,]
>>> >>> > & temp[cumsum(temp$rate)
>>> >>> >>0.70,])
>>> >>> >
>>> >>> > count=count+1
>>> >>> >
>>> >>> >     }
>>> >>> >
>>> >>> > }
>>> >>> >
>>> >>> > #my attempt for the second question
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > temp<-map[order(map$reg, map$p),]
>>> >>> >
>>> >>> > count<-1
>>> >>> >
>>> >>> > temp<-unique(temp$reg
>>> >>> >
>>> >>> > for(i in 1:length(ref) {
>>> >>> >
>>> >>> >   for(j in 1:length(ref)
>>> >>> >
>>> >>> >   {
>>> >>> >
>>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] &
>>> >>> > (temp[pos[j]==ref[ref$reg2,])
>>> >>> >
>>> >>> > output<-temp2[temp2$p==min(temp2$p),]
>>> >>> >
>>> >>> >     }
>>> >>> >
>>> >>> > }
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> > Data sets
>>> >>> >
>>> >>> >
>>> >>> >   Data= map
>>> >>> >
>>> >>> >   reg   p      rate
>>> >>> >
>>> >>> >  10276 0.700  3.867e-18
>>> >>> >
>>> >>> >  71608 0.830  4.542e-16
>>> >>> >
>>> >>> >  29220 0.430  1.948e-15
>>> >>> >
>>> >>> >  99542 0.220  1.084e-15
>>> >>> >
>>> >>> >  26441 0.880  9.675e-14
>>> >>> >
>>> >>> >  95082 0.090  7.349e-13
>>> >>> >
>>> >>> >  36169 0.480  9.715e-13
>>> >>> >
>>> >>> >  55572 0.500  9.071e-12
>>> >>> >
>>> >>> >  65255 0.300  1.688e-11
>>> >>> >
>>> >>> >  51960 0.970  1.163e-10
>>> >>> >
>>> >>> >  55652 0.388  3.750e-10
>>> >>> >
>>> >>> >  63933 0.250  9.128e-10
>>> >>> >
>>> >>> >  35170 0.720  7.355e-09
>>> >>> >
>>> >>> >  06491 0.370  1.634e-08
>>> >>> >
>>> >>> >  85508 0.470  1.057e-07
>>> >>> >
>>> >>> >  86666 0.580  7.862e-07
>>> >>> >
>>> >>> >  04758 0.810  9.501e-07
>>> >>> >
>>> >>> >  06169 0.440  1.104e-06
>>> >>> >
>>> >>> >  63933 0.750  2.624e-06
>>> >>> >
>>> >>> >  41838 0.960  8.119e-06
>>> >>> >
>>> >>> >
>>> >>> >  data=ref
>>> >>> >
>>> >>> >   reg1         reg2
>>> >>> >
>>> >>> >   29220     63933
>>> >>> >
>>> >>> >   26441     41838
>>> >>> >
>>> >>> >   06169     10276
>>> >>> >
>>> >>> >   74806     92643
>>> >>> >
>>> >>> >   73732     82451
>>> >>> >
>>> >>> >   86042     93502
>>> >>> >
>>> >>> >   85508     95082
>>> >>> >
>>> >>> >
>>> >>> >
>>> >>> >        the results I need
>>> >>> >
>>> >>> >      reg1      reg2 n
>>> >>> >
>>> >>> >    29220   63933  12
>>> >>> >
>>> >>> >    26441   41838   78
>>> >>> >
>>> >>> >    06169 10276  125
>>> >>> >
>>> >>> >    74806 92643   11
>>> >>> >
>>> >>> >    73732 82451   47
>>> >>> >
>>> >>> >    86042 93502   98
>>> >>> >
>>> >>> >    85508 95082  219
>>> >>> >
>>> >>> >         [[alternative HTML version deleted]]
>>> >>> >
>>> >>> > ______________________________________________
>>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> >>> > PLEASE do read the posting guide
>>> >>> http://www.R-project.org/posting-guide.html
>>> >>> > and provide commented, minimal, self-contained, reproducible code.
>>> >>>
>>> >>
>>> >>
>>> >
>>> >         [[alternative HTML version deleted]]
>>> >
>>> > ______________________________________________
>>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>> > https://stat.ethz.ch/mailman/listinfo/r-help
>>> > PLEASE do read the posting guide
>>> > http://www.R-project.org/posting-guide.html
>>> > and provide commented, minimal, self-contained, reproducible code.
>>
>>
>