[R] Conditional looping over a set of variables in R

Sun Oct 24 20:54:39 CEST 2010

Whoops, got an extra comma in there somehow; should be:

   apply(d, 1, function(x) match(1, x))

   -Peter Ehlers

On 2010-10-24 08:17, Peter Ehlers wrote:
> This won't be as quick as Bill's elegant solution, but it's a one-liner:
>
>    apply(d, 1, function(x), match(1, x))
>
> See ?match.
>
>     -Peter Ehlers
>
> On 2010-10-22 10:36, David Herzberg wrote:
>> Bill, thanks so much for this. I'll get a chance to test it later today, and will post the outcome.
>>
>>
>> David S. Herzberg, Ph.D.
>> Vice President, Research and Development
>> Western Psychological Services
>> 12031 Wilshire Blvd.
>> Los Angeles, CA 90025-1251
>> Phone: (310)478-2061 x144
>> FAX: (310)478-7838
>> email: davidh at wpspublish.com
>>
>>
>>
>> -----Original Message-----
>> From: William Dunlap [mailto:wdunlap at tibco.com]
>> Sent: Friday, October 22, 2010 9:52 AM
>> To: David Herzberg; r-help at r-project.org
>> Subject: RE: [R] Conditional looping over a set of variables in R
>>
>> You were a bit vague about the format of your data.
>> I'm assuming all columns were numeric and the entries are one of 0, 1, and NA (missing value).  I made a little function to generate random data of that format for testing purposes:
>>
>> makeData<- function (nrow = 1500, ncol = 140, pMissing = 0.1) {
>>       # pMissing if proportion of missing values
>>       m<- matrix(sample(c(1, 0), size = nrow * ncol, replace = TRUE),
>>           nrow, ncol)
>>       m[runif(nrow * ncol)<   pMissing]<- NA
>>       data.frame(m)
>> }
>>
>> E.g.,
>>
>>     >   set.seed(168)
>>     >   d<- makeData(15,3)
>>     >   d
>>         X1 X2 X3
>>      1   1  1  1
>>      2   0  0 NA
>>      3   0  1  0
>>      4   0  0 NA
>>      5   0  1  1
>>      6   0  0 NA
>>      7   1  0  0
>>      8   0  1  1
>>      9   0  0  1
>>     10   1  1 NA
>>     11   0  0  1
>>     12   0  0  0
>>     13  NA NA NA
>>     14   0  0  0
>>     15   1  0  0
>>
>> I think the following function does what you want.
>> The algorithm is pretty similar to what you showed.
>>
>>     columnOfFirstOne<- function(data) {
>>         # col will be return value, one entry per row of data.
>>         # Fill it with NA's: NA in output will mean there were no 1's in row
>>         col<- rep(as.integer(NA), nrow(data))
>>         for (j in seq_len(ncol(data))) { # loop over columns
>>             # For each entry in 'col', if it has not been set yet
>>             # and this entry the j'th column of data is 1 (and not
>> missing)
>>             # then set to the column number.
>>             col[is.na(col)&   !is.na(data[, j])&   data[, j] == 1]<- j
>>         }
>>         col # return this from function
>>     }
>>
>> With the above data we get
>>     >   columnOfFirstOne(d)
>>      [1]  1 NA  2 NA  2 NA  1  2  3  1  3 NA NA NA  1
>>
>> It seems quick enough for a dataset of your size
>>     >   dd<- makeData(nrow=1500, ncol=140)
>>     >   system.time(columnOfFirstOne(dd)) # time in seconds
>>        user  system elapsed
>>        0.08    0.00    0.08
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>> -----Original Message-----
>>> From: r-help-bounces at r-project.org
>>> [mailto:r-help-bounces at r-project.org] On Behalf Of David Herzberg
>>> Sent: Friday, October 22, 2010 8:34 AM
>>> To: r-help at r-project.org
>>> Subject: [R] Conditional looping over a set of variables in R
>>>
>>> Here's the problem I'm trying to solve in R: I have a data frame that
>>> consists of about 1500 cases (rows) of data from kids who took a test
>>> of listening comprehension. The columns are their scores (1 = correct,
>>> 0 = incorrect,  . = missing) on 140 test items. The items are numbered
>>> sequentially and are ordered by increasing difficulty as you go from
>>> left to right across the columns. I want R to go through the data and
>>> find the first correct response for each case. Because of basal and
>>> ceiling rules, many cases have missing data on many items before the
>>> first correct response appears.
>>>
>>> For each case, I want R to evaluate the item responses sequentially
>>> starting with item 1. If the score is 0 or missing, proceed to the
>>> next item and evaluate it. If the score is 1, stop the operation for
>>> that case, record the item number of that first correct response in a
>>> new variable, proceed to the next case, and restart the operation.
>>>
>>> In SPSS, this operation would be carried out with LOOP, VECTOR, and DO
>>> IF, as follows (assuming the data set is already loaded):
>>>
>>> * DECLARE A NEW VARIABLE TO HOLD THE ITEM NUMBER OF THE FIRST CORRECT
>>> RESPONSE, SET IT EQUAL TO 0.
>>> numeric LCfirst1.
>>> comp LCfirst1 = 0
>>>
>>> * DECLARE A VECTOR TO HOLD THE 140 ITEM RESPONSE VARIABLES.
>>> vector x=LC1a_score to LC140a_score.
>>>
>>> * SET UP A LOOP THAT WILL RUN FROM 1 TO 140, AS LONG AS
>>> LCfirst1 = 0. "#i" IS AN INDEX VARIABLE THAT INCREASES BY 1 EACH TIME
>>> THE LOOP RUNS.
>>> loop #i=1 to 140 if (LCfirst1 = 0).
>>>
>>> * SET UP A CONDITIONAL TRANSFORMATION THAT IS EVALUATED FOR EACH
>>> ELEMENT OF THE VECTOR.  THUS, WHEN #i = 1, THE EXPRESSION EVALUATES
>>> THE FIRST ELEMENT OF THE VECTOR (THAT IS, THE FIRST OF THE 140 ITEM
>>> RESPONSES). AS THE LOOP RUNS AND #i INCREASES, SUBSEQUENT VECTOR
>>> ELELMENTS ARE EVALUATED.
>>> THE do if STATEMENT RETAINS CONTROL AND KEEPS LOOPING THROUGH THE
>>> VECTOR UNTIL A '1' IS ENCOUNTERED.
>>> + do if x(#i) = 1.
>>>
>>> * WHEN A '1' IS ENCOUNTERED, CONTROL PASSES TO THE NEXT STATEMENT,
>>> WHICH RECODES THE VALUE OF THAT VECTOR ELEMENT TO '99'.
>>> + comp x(#i) = 99.
>>>
>>> * AND THEN CONTROL PASSES TO THE NEXT STATEMENT, WHICH RECODES THE
>>> VALUE OF LCfirst1 TO THE CURRENT INDEX VALUE, THUS CAPTURING THE ITEM
>>> NUMBER OF THE FIRST CORRECT RESPONSE FOR THAT CASE. CHANGING THE VALUE
>>> OF LCfirst1 ALSO CAUSE S THE LOOP TO STOP EXECUTING FOR THAT CASE, AND
>>> THE PROGRAM MOVES TO THE NEXT CASE AND RESTARTS THE LOOP.
>>> + comp LCfirst1 = #i.
>>> + end if.
>>> end loop.
>>> exe.
>>>
>>> After several hours of trying to translate this procedure to R, I'm
>>> stumped. I played around with creating a list to hold the item
>>> responses variables (analogous to 'vector' in SPSS), but when I tried
>>> to use the list in an R procedure, I kept getting a warning along the
>>> lines of  'the list contains>   1 element, only the first element will
>>> be used'. So perhaps a list is not the appropriate class to 'hold'
>>> these variables?
>>>
>>> It seems that some nested arrangement of 'for' 'while' and/or 'lapply'
>>> will allow me to recreate the operation described above? How do I set
>>> up the indexing operation analogous to 'loop #i' in SPSS?
>>>
>>> Any help is appreciated, and I'm happy to provide more information if
>>> needed.
>>>
>>> David S. Herzberg, Ph.D.
>>> Vice President, Research and Development Western Psychological
>>> Services
>>> 12031 Wilshire Blvd.
>>> Los Angeles, CA 90025-1251
>>> Phone: (310)478-2061 x144
>>> FAX: (310)478-7838
>>> email: davidh at wpspublish.com
>>>