[R] standard error of survfit.coxph()

Therneau, Terry M., Ph.D. therneau at mayo.edu
Mon Jun 30 15:04:45 CEST 2014


1. The computations "behind the scenes" produce the variance of the cumulative hazard. 
This is true for both an ordinary Kaplan-Meier and a Cox model.  Transformations to other 
scales are done using simple Taylor series.

   H = cumulative hazard = log(S);  S=survival
   var(H) = var(log(S))  = the starting point
   S = exp(log(S)), so  var(S) is approx [deriv of exp(x)]^2 * var(log(S)) = S^2 var(H)
   var(log(log(S)) is approx (1/S^2) var(H)

2. At the time it was written, summary.survfit was used only for printing out the survival 
curve at selected times, and the audience for the printout wanted std(S).   True, that was 
20 years ago, but I don't recall anyone ever asking for summary to do anything else.  Your 
request is not a bad idea.
   Note however that the primary impact of using log(S) or S or log(log(S)) scale is is on 
the confidence intervals, and they do appear per request in the summary output.

Terry T.


On 06/28/2014 05:00 AM, r-help-request at r-project.org wrote:
> Message: 9
> Date: Fri, 27 Jun 2014 12:39:29 -0700
> From: array chip<arrayprofile at yahoo.com>
> To:"r-help at r-project.org"  <r-help at r-project.org>
> Subject: [R] standard error of survfit.coxph()
> Message-ID:
> 	<1403897969.91269.YahooMailNeo at web122906.mail.ne1.yahoo.com>
> Content-Type: text/plain
>
> Hi, can anyone help me to understand the standard errors printed in the output of survfit.coxph()?
>
> time<-sample(1:15,100,replace=T)
>
> status<-as.numeric(runif(100,0,1)<0.2)
> x<-rnorm(100,10,2)
>
> fit<-coxph(Surv(time,status)~x)
> ??? ### method 1
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$std.err
>
> ???????????? [,1]??????? [,2]??????? [,3]??????? [,4]?????? [,5]
> ?[1,] 0.000000000 0.000000000 0.000000000 0.000000000 0.00000000
> ?[2,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[3,] 0.008627644 0.008567253 0.008773699 0.009354788 0.01481819
> ?[4,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[5,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[6,] 0.013800603 0.013767977 0.013889971 0.014379928 0.02353371
> ?[7,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[8,] 0.030226811 0.030423883 0.029806263 0.028918817 0.05191161
> ?[9,] 0.036852571 0.037159980 0.036186931 0.034645002 0.06485394
> [10,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [11,] 0.044181716 0.044621159 0.043221145 0.040872939 0.07931028
> [12,] 0.055452631 0.056018832 0.054236881 0.051586391 0.10800413
> [13,] 0.070665160 0.071363749 0.069208056 0.066655730 0.14976433
> [14,] 0.124140400 0.125564637 0.121281571 0.118002021 0.30971860
> [15,] 0.173132357 0.175309455 0.168821266 0.164860523 0.46393111
>
> survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log')$time
> ?[1]? 1? 2? 3? 4? 5? 6? 7? 8? 9 10 11 12 13 14 15
>
> ??? ### method 2
>
> summary(survfit(fit, newdata=data.frame(time=time,status=status,x=x)[1:5,], conf.type='log'),time=10)$std.err
>
> ????????????? 1????????? 2????????? 3????????? 4????????? 5
> [1,] 0.04061384 0.04106186 0.03963184 0.03715246 0.06867532
>
> By reading the help of ?survfit.object and ?summary.survfit, the standard error provided in the output of method 1 (survfit()) was for cumulative hazard-log(survival), while the standard error provided in the output of method 2 (summary.survfit()) was for survival itself, regardless of how you choose the value for "conf.type" ('log', 'log-log' or 'plain'). This explains why the standard error output is different between method 1 (10th row) and method 2.
>
> My question is how do I get standard error estimates for log(-log(survival))?
>
> Thanks!
>
> John



More information about the R-help mailing list