[R] replacing ugly for loops

Bert Gunter gunter.berton at gene.com
Thu Oct 11 19:57:06 CEST 2012


Sorry, you **did** supply data and my solution **does** work (except I
left off 1 closing ")" .

> sq.n <- seq_len(nrow(data.df))
> tapply(sq.n,data.df$seq,function(x)with(data.df[x,],
+ sort(unique(do.call(c,mapply(seq,from=startNo,length=len,SIMPLIFY=FALSE))))))
$`1`
[1]  3  4  5  6 10 11

$`2`
[1]  3  4  5  6  7 15 16 17

Cheers,
Bert


On Wed, Oct 10, 2012 at 10:59 PM, Bert Gunter <bgunter at gene.com> wrote:
> I am not sure you have expressed what you wanjt to do correctly. See inline:
>
> On Wed, Oct 10, 2012 at 9:10 PM, andrewH <ahoerner at rprogress.org> wrote:
>> I have a couple of hundred American Community Survey Summary Files files
>> containing rectangular arrays of data, mainly though not exclusively
>> numeric.  Each file is referred to as a sequence (henceforth "seq").
> -- so 1 "seq" (terrible identifier -- see below for why) = 1 file
>
>  From
>> these files I am trying to extract particular subsets (tables) consisting of
>> a sets of columns.  These tables are defined by three numbers (now in
>> columns in a data frame):
>> 1.      a file identifier (seq)
>> 2.      first column position numbers (startNo)
>> 3.      length of table (len)
>
> So your data frame, call it yourframe, has columns named:
>
> seq      startNo       len
>
>
>> so the columns to select for one triple would consist of
>> startNo:(startNo+length-1).   I am trying to create for each sequence a
>> vector of all the column numbers for tables in that sequence.
>
> So for each seq id you want to find all the column numbers, right?
>
> sq.n <- seq_len(nrow(yourframe)) ## Just to make it easier to read
> colms <-  tapply(sq.n, yourframe$seq,function(x) with(yourframe[x,],
>    sort(unique(do.call(c, mapply(seq, from=startNo,
> length=len,SIMPLIFY = FALSE)))))
>
> ## Comments
> In the mapply call, seq is the R function, ?seq.  That's why using it
> as a name for a file id is terrible -- it causes confusion.
>
> In the absence of data, this is untested -- and probably not quite
> right. But it should be close, I hope. The key idea is the use of
> mapply to get the sequence of columns for each row in all the rows for
> each seq id. The SIMPLIFY = FALSE guarantees that this yields a list
> of vectors of column indices, which are then glopped together and
> cleaned up by the sort(unique(do.call(  ...  stuff.
>
> colms should then be a list giving the sorted column numbers to choose
> for each "seq" id.
>
> I do not know whether (once cleaned up,) this is either more elegant
> or more efficient than what you proposed. And I wouldn't be surprised
> if someone like Bill Dunlap comes up with a lot better way, either.
> But it is different -- and perhaps amusing.
>
> ... If I have properly understood what you wanted. If not, ignore all.
>
> Cheers,
> Bert
>
>>
>> Obviously I could do this with nested for loops,e.g..
>>
>>> seq <- c(1,1,2,2)
>>> startNo  <- c(3, 10, 3, 15)
>>> len <- c(4, 2, 5, 3)
>>> data.df <- data.frame(seq, startNo, len)
>>>
>>> seq.f <- factor(data.df$seq)
>>> data.l <- split(data.df, seq.f)
>>> selectColsList<- vector("list", length(levels(seq.f)))
>>> for (i in seq_along(levels(seq.f))){
>>    selectCols <- numeric()
>>        for (j in seq_along(data.l[[i]]$startNo)){
>>            selectCols <- c(selectCols,
>> data.l[[i]]$startNo[j]:(data.l[[i]]$startNo[j]
>>            data.l[[i]]$len[j]-1))
>>         }
>>     selectColsList[[i]] <- selectCols
>> }
>>> selectColsList
>> [[1]]
>> [1]  3  4  5  6 10 11
>> [[2]]
>> [1]  3  4  5  6  7 15 16 17
>>
>> But this code strikes me as inelegant and verbose. It seems to me that there
>> ought to be a way to make the outer loop, (indexed with i) into a tapply
>> function (which is why I started with a split()), and the inner loop
>> (indexed with j) into some cute recursive function, but I was not able to do
>> so. If anyone could suggest some nicer (e.g. shorter, or faster, or just
>> more sophisticated) way to do this instead, I would be most grateful.
>>
>> Sincerely, andrewH
>>
>>
>>
>>
>> --
>> View this message in context: http://r.789695.n4.nabble.com/replacing-ugly-for-loops-tp4645821.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
>
> Bert Gunter
> Genentech Nonclinical Biostatistics
>
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm




More information about the R-help mailing list