[R] two difficult loop

greg holly mak.hholly at gmail.com
Mon Jun 13 16:29:07 CEST 2016


Hi Jim;

I do apologize if bothering. I have run on the real data and here is the
error message I got:

Thanks,

Greg

start  end
Error in regrange[1]:regrange[2] : result would be too long a vector
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf

On Mon, Jun 13, 2016 at 10:28 AM, greg holly <mak.hholly at gmail.com> wrote:

> Hi Jim;
>
> I do apologize if bothering. I have run on the real data and here is the
> error message I got:
>
> Thanks,
>
> Greg
>
> start  end
> Error in regrange[1]:regrange[2] : result would be too long a vector
> In addition: Warning messages:
> 1: In min(x) : no non-missing arguments to min; returning Inf
> 2: In max(x) : no non-missing arguments to max; returning -Inf
>
>
> On Mon, Jun 13, 2016 at 3:19 AM, Jim Lemon <drjimlemon at gmail.com> wrote:
>
>> Hi Greg,
>> Okay, I have a better idea now of what you want. The problem of
>> multiple matches is still there, but here is a start:
>>
>> # this data frame actually contains all the values in ref in the "reg"
>> field
>> map<-read.table(text="reg p rate
>>  10276 0.700  3.867e-18
>>  71608 0.830  4.542e-16
>>  29220 0.430  1.948e-15
>>  99542 0.220  1.084e-15
>>  26441 0.880  9.675e-14
>>  95082 0.090  7.349e-13
>>  36169 0.480  9.715e-13
>>  55572 0.500  9.071e-12
>>  65255 0.300  1.688e-11
>>  51960 0.970  1.163e-10
>>  55652 0.388  3.750e-10
>>  63933 0.250  9.128e-10
>>  35170 0.720  7.355e-09
>>  06491 0.370  1.634e-08
>>  85508 0.470  1.057e-07
>>  86666 0.580  7.862e-07
>>  04758 0.810  9.501e-07
>>  06169 0.440  1.104e-06
>>  63933 0.750  2.624e-06
>>  41838 0.960  8.119e-06
>>  74806 0.810  9.501e-07
>>  92643 0.470  1.057e-07
>>  73732 0.090  7.349e-13
>>  82451 0.960  8.119e-06
>>  86042 0.480  9.715e-13
>>  93502 0.500  9.071e-12
>>  85508 0.370  1.634e-08
>>  95082 0.830  4.542e-16",
>>  header=TRUE)
>> # same as in your example
>> ref<-read.table(text="reg1 reg2
>>  29220     63933
>>  26441     41838
>>  06169     10276
>>  74806     92643
>>  73732     82451
>>  86042     93502
>>  85508     95082",
>>  header=TRUE)
>> # sort the "map" data frame
>> map2<-map[order(map$reg),]
>> # get a field for the counts
>> ref$n<-NA
>> # and a field for the minimum p values
>> ref$min_p<-NA
>> # get the number of rows in "ref"
>> nref<-dim(ref)[1]
>> for(i in 1:nref) {
>>  start<-which(map2$reg==ref$reg1[i])
>>  end<-which(map2$reg==ref$reg2[i])
>>  cat("start",start,"end",end,"\n")
>>  # get the range of matches
>>  regrange<-range(c(start,end))
>>  # convert this to a sequence spanning all matches
>>  allreg<-regrange[1]:regrange[2]
>>  ref$n[i]<-sum(map2$p[allreg] > 0.85)
>>  ref$min_p[i]<-min(map2$p[allreg])
>> }
>>
>> This example uses the span from the first match of "reg1" to the last
>> match of "reg2". This may not be what you want, so let me know if
>> there are further constraints.
>>
>> Jim
>>
>> On Mon, Jun 13, 2016 at 12:35 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> > Hi Bert;
>> >
>> > I do appreciate for this. I need check your codes on task2 tomorrow at
>> my
>> > office on the real data as I have difficulty (because a technical
>> issue) to
>> > remote connection. I am sure it will work well.
>> >
>> > I am sorry that I was not able to explain my first question. Basically
>> >
>> > Values in ref data represent the region of chromosome. I need choose
>> these
>> > regions in map (all regions values in ref data are exist in map data in
>> the
>> > first column -column map$reg). And then summing up the column "map$rate
>> and
>> > count the numbers that gives >0.85. For example, consider  the first
>> row in
>> > data ref. They are 29220   and  63933. After sorting the first column in
>> > map then summing column "map$rate" only between 29220   to  63933 in
>> sorted
>> > map and cut off at >0.85. Then count how many rows in sorted map gives
>> >>0.85. For example consider there are 38 rows between 29220   in  63933
>> in sorted
>> > map$reg and only summing first 12 of them  gives>0.85. Then my answer is
>> > going to be 12 for 29220   -  63933 in ref.
>> >
>> > Thanks I lot for your patience.
>> >
>> > Cheers,
>> > Greg
>> >
>> > On Sun, Jun 12, 2016 at 10:35 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> >
>> >> Hi Bert;
>> >>
>> >> I do appreciate for this. I need check your codes on task2 tomorrow at
>> my
>> >> office on the real data as I have difficulty (because a technical
>> issue) to
>> >> remote connection. I am sure it will work well.
>> >>
>> >> I am sorry that I was not able to explain my first question. Basically
>> >>
>> >> Values in ref data represent the region of chromosome. I need choose
>> these
>> >> regions in map (all regions values in ref data are exist in map data
>> in the
>> >> first column -column map$reg). And then summing up the column
>> "map$rate and
>> >> count the numbers that gives >0.85. For example, consider  the first
>> row in
>> >> data ref. They are 29220   and  63933. After sorting the first column
>> in
>> >> map then summing column "map$rate" only between 29220   to  63933 in
>> >> sorted map and cut off at >0.85. Then count how many rows in sorted map
>> >> gives >0.85. For example consider there are 38 rows between 29220   in
>> >>  63933 in sorted map$reg and only summing first 12 of them  gives>0.85.
>> >> Then my answer is going to be 12 for 29220   -  63933 in ref.
>> >>
>> >> Thanks I lot for your patience.
>> >>
>> >> Cheers,
>> >> Greg
>> >>
>> >> On Sun, Jun 12, 2016 at 6:36 PM, Bert Gunter <bgunter.4567 at gmail.com>
>> >> wrote:
>> >>
>> >>> Greg:
>> >>>
>> >>> I was not able to understand your task 1. Perhaps others can.
>> >>>
>> >>> My understanding of your task 2 is that for each row of ref, you wish
>> >>> to find all rows,of map such that the reg values in those rows fall
>> >>> between the reg1 and reg2 values in ref (inclusive change <= to < if
>> >>> you don't want the endpoints), and then you want the minimum map$p
>> >>> values of all those rows. If that is correct, I believe this will do
>> >>> it (but caution, untested, as you failed to provide data in a
>> >>> convenient form, e.g. using dput() )
>> >>>
>> >>> task2 <- with(map,vapply(seq_len(nrow(ref)),function(i)
>> >>> min(p[ref[i,1]<=reg & reg <= ref[i,2] ]),0))
>> >>>
>> >>>
>> >>> If my understanding is incorrect, please ignore both the above and the
>> >>> following:
>> >>>
>> >>>
>> >>> The "solution" I have given above seems inefficient, so others may be
>> >>> able to significantly improve it if you find that it takes too long.
>> >>> OTOH, my understanding of your specification is that you need to
>> >>> search for all rows in map data frame that meet the criterion for each
>> >>> row of ref, and without further information, I don't know how to do
>> >>> this without just repeating the search 560 times.
>> >>>
>> >>>
>> >>> Cheers,
>> >>> Bert
>> >>>
>> >>>
>> >>> Bert Gunter
>> >>>
>> >>> "The trouble with having an open mind is that people keep coming along
>> >>> and sticking things into it."
>> >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> >>>
>> >>>
>> >>> On Sun, Jun 12, 2016 at 1:14 PM, greg holly <mak.hholly at gmail.com>
>> wrote:
>> >>> > Dear all;
>> >>> >
>> >>> >
>> >>> >
>> >>> > I have two data sets, data=map and data=ref). A small part of each
>> data
>> >>> set
>> >>> > are given below. Data map has more than 27 million and data ref has
>> >>> about
>> >>> > 560 rows. Basically I need run two different task. My R codes for
>> these
>> >>> > task are given below but they do not work properly.
>> >>> >
>> >>> > I sincerely do appreciate your helps.
>> >>> >
>> >>> >
>> >>> > Regards,
>> >>> >
>> >>> > Greg
>> >>> >
>> >>> >
>> >>> >
>> >>> > Task 1)
>> >>> >
>> >>> > For example, the first and second columns for row 1 in data ref are
>> >>> 29220
>> >>> > 63933. So I need write an R code normally first look the first row
>> in
>> >>> ref
>> >>> > (which they are 29220 and 63933) than summing the column of
>> "map$rate"
>> >>> and
>> >>> > give the number of rows that >0.85. Then do the same for the second,
>> >>> > third....in ref. At the end I would like a table gave below (the
>> >>> results I
>> >>> > need). Please notice the all value specified in ref data file are
>> exist
>> >>> in
>> >>> > map$reg column.
>> >>> >
>> >>> >
>> >>> >
>> >>> > Task2)
>> >>> >
>> >>> > Again example, the first and second columns for row 1 in data ref
>> are
>> >>> 29220
>> >>> > 63933. So I need write an R code give the minimum map$p for the
>> 29220
>> >>> > -63933 intervals in map file. Than
>> >>> >
>> >>> > do the same for the second, third....in ref.
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>> > #my attempt for the first question
>> >>> >
>> >>> > temp<-map[order(map$reg, map$p),]
>> >>> >
>> >>> > count<-1
>> >>> >
>> >>> > temp<-unique(temp$reg
>> >>> >
>> >>> > for(i in 1:length(ref) {
>> >>> >
>> >>> >   for(j in 1:length(ref)
>> >>> >
>> >>> >   {
>> >>> >
>> >>> > temp1<-if (temp[pos[i]==ref[ref$reg1,] &
>> (temp[pos[j]==ref[ref$reg2,]
>> >>> > & temp[cumsum(temp$rate)
>> >>> >>0.70,])
>> >>> >
>> >>> > count=count+1
>> >>> >
>> >>> >     }
>> >>> >
>> >>> > }
>> >>> >
>> >>> > #my attempt for the second question
>> >>> >
>> >>> >
>> >>> >
>> >>> > temp<-map[order(map$reg, map$p),]
>> >>> >
>> >>> > count<-1
>> >>> >
>> >>> > temp<-unique(temp$reg
>> >>> >
>> >>> > for(i in 1:length(ref) {
>> >>> >
>> >>> >   for(j in 1:length(ref)
>> >>> >
>> >>> >   {
>> >>> >
>> >>> > temp2<-if (temp[pos[i]==ref[ref$reg1,] &
>> (temp[pos[j]==ref[ref$reg2,])
>> >>> >
>> >>> > output<-temp2[temp2$p==min(temp2$p),]
>> >>> >
>> >>> >     }
>> >>> >
>> >>> > }
>> >>> >
>> >>> >
>> >>> >
>> >>> > Data sets
>> >>> >
>> >>> >
>> >>> >   Data= map
>> >>> >
>> >>> >   reg   p      rate
>> >>> >
>> >>> >  10276 0.700  3.867e-18
>> >>> >
>> >>> >  71608 0.830  4.542e-16
>> >>> >
>> >>> >  29220 0.430  1.948e-15
>> >>> >
>> >>> >  99542 0.220  1.084e-15
>> >>> >
>> >>> >  26441 0.880  9.675e-14
>> >>> >
>> >>> >  95082 0.090  7.349e-13
>> >>> >
>> >>> >  36169 0.480  9.715e-13
>> >>> >
>> >>> >  55572 0.500  9.071e-12
>> >>> >
>> >>> >  65255 0.300  1.688e-11
>> >>> >
>> >>> >  51960 0.970  1.163e-10
>> >>> >
>> >>> >  55652 0.388  3.750e-10
>> >>> >
>> >>> >  63933 0.250  9.128e-10
>> >>> >
>> >>> >  35170 0.720  7.355e-09
>> >>> >
>> >>> >  06491 0.370  1.634e-08
>> >>> >
>> >>> >  85508 0.470  1.057e-07
>> >>> >
>> >>> >  86666 0.580  7.862e-07
>> >>> >
>> >>> >  04758 0.810  9.501e-07
>> >>> >
>> >>> >  06169 0.440  1.104e-06
>> >>> >
>> >>> >  63933 0.750  2.624e-06
>> >>> >
>> >>> >  41838 0.960  8.119e-06
>> >>> >
>> >>> >
>> >>> >  data=ref
>> >>> >
>> >>> >   reg1         reg2
>> >>> >
>> >>> >   29220     63933
>> >>> >
>> >>> >   26441     41838
>> >>> >
>> >>> >   06169     10276
>> >>> >
>> >>> >   74806     92643
>> >>> >
>> >>> >   73732     82451
>> >>> >
>> >>> >   86042     93502
>> >>> >
>> >>> >   85508     95082
>> >>> >
>> >>> >
>> >>> >
>> >>> >        the results I need
>> >>> >
>> >>> >      reg1      reg2 n
>> >>> >
>> >>> >    29220   63933  12
>> >>> >
>> >>> >    26441   41838   78
>> >>> >
>> >>> >    06169 10276  125
>> >>> >
>> >>> >    74806 92643   11
>> >>> >
>> >>> >    73732 82451   47
>> >>> >
>> >>> >    86042 93502   98
>> >>> >
>> >>> >    85508 95082  219
>> >>> >
>> >>> >         [[alternative HTML version deleted]]
>> >>> >
>> >>> > ______________________________________________
>> >>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> >>> > PLEASE do read the posting guide
>> >>> http://www.R-project.org/posting-guide.html
>> >>> > and provide commented, minimal, self-contained, reproducible code.
>> >>>
>> >>
>> >>
>> >
>> >         [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>

	[[alternative HTML version deleted]]



More information about the R-help mailing list