[R] Basic question about re-writing for loop as a function

Patrick Burns pburns at pburns.seanet.com
Mon Aug 29 18:51:47 CEST 2011


You are somewhere in Circles 3 and 4 of
'The R Inferno'.

If you have a function to apply over more
than one argument, then 'mapply' will do
that.

But you don't need to do that -- you can do
the operation you want efficiently:

*) create your resulting matrix with all zeros,
no reason for this to be a data frame, almost
surely.

mainmat <- matrix(0, ncol=92, nrow=...)

*) create a subscripting matrix giving the row
and column combinations to change to 1.  Here is
a small example:

 > ss <- strsplit(c("1", "2,3", "1"), split=",")
 > sr <- rep(1:length(ss), sapply(ss, length))
 > sr
[1] 1 2 2 3
 > sc <- as.numeric(unlist(ss))
 > sc
[1] 1 2 3 1
 > mainmat[cbind(sr, sc)] <- 1



On 29/08/2011 14:55, Chris Beeley wrote:
> Hello-
>
> Sorry to ask a basic question, but I've spent many hours on this now
> and seem to be missing something.
>
> I have a loop that looks like this:
>
>      mainmat=data.frame(matrix(data=0, ncol=92, nrow=length(predata$Words_MH)))
>
>      for(i in 1:length(predata$Words_MH)){
>      for(j in 1:92){
>
>      mainmat[i,j]=ifelse(j %in%
> as.numeric(unlist(strsplit(predata$Words_MH[i], split=","))), 1, 0)
>
>      }
>      }
>
> What it's doing is creating a matrix with 92 columns, that's the
> number of different codes, and then for every row of my data it looks
> to see if the code (code 1, code 2, etc.) is in the string and if it
> is, returns a 1 in the relevant column (column 1 for code 1, column 2
> for code 2, etc.)
>
> There are 1000 rows in the database, and I have to run several
> versions of this code, so it just takes way too long, I have been
> trying to rewrite using lapply. I tried this:
>
>      myfunction=function(x, y) ifelse(x %in%
> as.numeric(unlist(strsplit(predata$Words_MH[y], split=","))), 1, 0)
>
>      for(j in 1:92){
>      mainmat[,j]= lapply(predata$Words, myfunction)
>      }
>
> but I don't think I can use something that takes two inputs, and I
> can't seem to remove either.
>
> Here's a dput of the first 10 rows of the variable in case that's helpful:
>
> predata$Words=c("1", "1", "1", "1", "2,3,4", "5", "1", "1", "6", "7,8,9,10")
>
> Given these data, I want the function to return, for the first column,
> 1, 1, 1, 1, 0, 0, 1, 1, 0, 0 (because those are the values of Words
> which contain a 1) and for the second column return 0, 0, 0, 0, 1, 0,
> 0, 0, 0, 0 (because the fifth value is the only one that contains a
> 2).
>
> Any suggestions gratefully received!
>
> Chris Beeley
> Institute of Mental Health, UK
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
Patrick Burns
pburns at pburns.seanet.com
twitter: @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of 'Some hints for the R beginner'
and 'The R Inferno')



More information about the R-help mailing list