[R] Selecting one row or multiple rows per ID

Dieter Menne dieter.menne at menne-biomed.de
Wed Mar 4 10:09:32 CET 2009


Vedula, Satyanarayana <svedula <at> jhsph.edu> writes:

> 
> 
> I need to select one row per patient i in clinic j. The data is organized
> similar to that shown below.
> 
... 
> If patient has outcome recorded at visit 2, then outcome = outcome 
>columns   at visit 2
> If patient does not have visit 2, then outcome = outcome at visit 5
> If patient does not have visit 2 and visit 5, then outcome = outcome at 
> visit ... other rules

I prefer to use a table driven approach here, because one can easily
get lost in all these if's, and medical research requires well defined
documentation of the outcome you choose.

So I first convert the data to the wide format; you might alternatively
use function cast in package reshape for this, but I never can get the 
syntax right. I also prefer to do most of this preparatory work on the
database level, e.g. with PIVOT.

Create a translation table of the 25 possible combinations to the 
column you selected, and you can be sure you forgot no combination.

Dieter



outc = data.frame(
  patclin = as.factor(
         paste(c(1,1,1,1,3,3,3,3),
               c(1,3,3,3,5,5,5,5),sep=".")), 
  vis  = as.factor(c(2,1,2,3,1,3,4,5)),
  outcom = c(22,21,21,20,24,21,22,22))

outw = reshape(outc,v.names="outcom",idvar="patclin",timevar="vis",
  direction="wide")
outw = outw[,order(names(outw))]
# I am sure there is a more elegant way to do this
# I prefer to do this type of work on the database level 
outw$code= as.factor(
  apply(sapply(outw[,1:5],function(x){as.integer(!is.na(x))}),1,paste,
  collapse=""))

# Note : the values here are not exactly what you requeste, 
# use your logic to select columns here
usevisit = data.frame(code=levels(outw$code),visit=c(2,3,4))
outw = merge(usevisit,outw)
outw

# you get a documented table of the columns you selected and
# can use visit to select the column
#   code visit outcom.1 outcom.2 outcom.3 outcom.4 outcom.5 patclin
#1 01000     2       NA       22       NA       NA       NA     1.1
#2 10111     3       24       NA       21       22       22     3.5
#3 11100     4       21       21       20       NA       NA     1.3




More information about the R-help mailing list