[R] wrapper for coxph with a subset argument
Christos Hatzis
christos at nuverabio.com
Fri Nov 9 18:47:41 CET 2007
In terms of recommended approach, I think it would be easier to generate
subsetting conditions as (lists of) logical vectors and use those as
suggested before. It seems to me more cumbersome to use the data to
generate logical conditions as text strings and then parse those within your
wrapper to get back the logical vector.
-Christos
> -----Original Message-----
> From: r-help-bounces at r-project.org
> [mailto:r-help-bounces at r-project.org] On Behalf Of Erik Iverson
> Sent: Friday, November 09, 2007 12:05 PM
> To: 'r-help at stat.math.ethz.ch'
> Subject: [R] wrapper for coxph with a subset argument
>
> Dear R-help -
>
> Thanks to those who replied yesterday (Christos H. and Thomas
> L.) regarding my question on coxph and model formula, the
> answers worked perfectly.
>
> My new question involves the following.
>
> I want to run several coxph models (package survival) with
> the same dataset, but different subsets of that dataset.
>
> I have found a way to do this, described below in functions
> subwrap1 and subwrap2. These do not use the coxph "subset"
> argument, however, as you will see.
>
> My three main questions are :
>
> 1) When writing a wrapper like this, should I be using the
> subset argument in coxph(), or alternatively, doing what I am
> doing in subwrap1 and subwrap2 below? Is the subset argument
> in coxph more of a convenience tool for interactive use
> rather than programs?
>
> 2) If the approach in subwrap1 and subwrap2 is fine, is there
> a preference for using 'expressions' or 'strings'?
> Eventually, my program will create these subset conditions
> programmatically, so I think strings will be the way I have
> to go, even though I've seen warnings on this list about
> using the eval(parse()) construct.
>
> 3) Is there some approach to do this that I'm overlooking?
> My goal will be to produce a list of subset conditions
> (probably a character vector), and then use lapply to run the
> various cox regressions.
>
> I can already achieve my goal, I just would like to know more
> details about how others do things like this.
>
> I've simplified my code below to focus on where I feel I'm confused.
> Here is some code along with comments:
>
> #### BEGIN R SAMPLE CODE
>
> #Function for producing test data
> makeTestDF <- function(n) {
> times <- sample(1:200, n, replace = TRUE)
> event <- rbinom(n, 1, prob = .1)
> trt <- rep(c("A","B"), each = n/2)
> sex <- factor(c("M","F"))
> sex <- rep(sex, times = n/2)
> testdf <- data.frame(times,event,trt,sex) }
>
> # Make test data, n = 200
> testdf <- makeTestDF(200)
>
> # Cox wrapper function with subset, this one works # Takes
> subset as expression
> subwrap1 <- function(x, sb) {
> sb <- eval(substitute(sb), x)
> x <- x[sb,]
> coxph(Surv(times,event)~trt, data = x) }
>
> subwrap1(testdf, sex == 'F')
>
> # This next one also works, but uses a character variable #
> instead of an expression as the subset argument
>
> subwrap2 <- function(x, sb) {
> sb <- eval(parse(text = sb), x)
> x <- x[sb,]
> coxph(Surv(times,event)~trt, data = x) }
>
> subwrap2(testdf, "sex == 'F'")
>
> # Neither of the above use the coxph subset argument # If I
> try using that, I get stuck with expressions, # I've tried
> many # different things in the subset argument, but none #
> seem to do the trick. Is using this argument in a # program
> even advisable?
>
> subwrap3 <- function(x, sb) {
> coxph(Surv(times,event)~trt, data = x,
> subset = eval(substitute(sb), x))
> }
>
> subwrap3(testdf, sex == 'F') #does not work
>
> # Using a string, this works, however.
>
> subwrap4 <- function(x, sb) {
> coxph(Surv(times,event)~trt, data = x, subset =
> eval(parse(text=sb))) }
>
> subwrap4(testdf, "sex == 'F'")
>
> ### END R SAMPLE CODE
>
> Thanks so much,
> Erik Iverson
> iverson at biostat.wisc.edu
>
> > sessionInfo()
> R version 2.5.1 (2007-06-27)
> i686-pc-linux-gnu
>
> locale:
> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLA
> TE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8
> ;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC
> _MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
>
> attached base packages:
> [1] "grDevices" "datasets" "tcltk" "splines"
> "graphics" "utils"
> [7] "stats" "methods" "base"
>
> other attached packages:
> debug mvbutils SPLOTS_1.2-6 Hmisc chron
> survival
> "1.1.0" "1.1.1" "1.2-6" "3.4-2" "2.3-13"
> "2.32"
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
More information about the R-help
mailing list