[R] How to simulate informative censoring in a Cox PH model?

Fri Jul 24 18:14:13 CEST 2015

Hi Greg

Many thanks for taking the time to respond to my query. You are right about
pointing out the distinction between what variables are and are not
included in the event times process and in the censoring process. I clearly
forgot this important aspect. I amended my code to do as you advise and now
I am indeed getting biased estimates when using the informatively censored
responses. The problem is now that the estimates from the independently
censored responses are the same - i.e. they are just as biased. Thus the
bias seems to be due entirely to model mis-specification and not the
informative censoring.

To give a concrete example I simulate event times T_i and censoring times
C_i from the following models;

T_i~ Weibull(lambda_t(x),v_t),    lambda_t(x)=lambda_t*exp( beta_t_0 +
(beta_t_1*Treat_i) + (beta_t_2*Z_i) + (beta_t_3*Treat_i*Z_i)  )

C_i~ Weibull(lambda_c(x),v_c),    lambda_c(x)=lambda_c*exp( beta_c_0 +
(beta_c_1*Treat_i) + (beta_c_2*Z_i) + (beta_c_3*Treat_i*Z_i)  )

D_i~Weibull(lambda_d(x),v_D), lambda_d(x)=lamda_d*exp( beta_d_0)

where ;

beta_t_0 = 1,  beta_t_1 = -1,   beta_t_2 = 1,  beta_t_3 = -2,   lambda_t=0.5

beta_c_0 = 0.2,  beta_c_1 = -2,   beta_c_2 = 2,  beta_c_3 = -2,
lambda_c=0.5

beta_d_0 = -0.7,  lambda_d=0.1

When I fit the cox model to both the informatively censored responses and
the independent censored responses I include only the Treatment covariate
in the model.

I simulate Treatment from a Bernoulli distribution with p=0.5 and Z_i from
a beta distribution so that Z ranges from 0 to 1 (I like to think of Z as a
"poor" prognosis probability where Z_i=1 means a subject is 100% certain to
have a poor prognosis and Z_i=0 means zero chance). These parameter choices
give approximately 27% and 25% censoring for the informatively censored
responses (using C_i) and the independent censored responses (using D_i)
respectively. I use N=2000 subjects and 2000 simulation replications.

The above simulation I get estimates of beta_t_2 of -1.526 and -1.537 for
the informatively censored responses and the independent censored responses
respectively.

Furthermore when I fit a cox model to the full responses (no censoring at
all) I get an estimate of beta_t_2 of -1.542. This represents the best that
can possibly be done given that Z and Treat*Z are not in the model. Clearly
censoring is not making much of a difference here - model mis-specification
dominates.

I still must be doing something wrong but I cannot figure this one out.

Thanks

Dan

On Thu, Jul 23, 2015 at 12:33 AM, Greg Snow <538280 at gmail.com> wrote:

> I think that the Cox model still works well when the only information
> in the censoring is conditional on variables in the model.  What you
> describe could be called non-informative conditional on x.
>
> To really see the difference you need informative censoring that
> depends on something not included in the model.  One option would be
> to use copulas to generate dependent data and then transform the
> values using your Weibul.  Or you could generate your event times and
> censoring times based on x1 and x2, but then only include x1 in the
> model.
>
> On Wed, Jul 22, 2015 at 2:20 AM, Daniel Meddings <dpmeddings at gmail.com>
> wrote:
> > I wish to simulate event times where the censoring is informative, and to
> > compare parameter estimator quality from a Cox PH model with estimates
> > obtained from event times generated with non-informative censoring.
> However
> > I am struggling to do this, and I conclude rather than a technical flaw
> in
> > my code I instead do not understand what is meant by informative and
> > un-informative censoring.
> >
> > My approach is to simulate an event time T dependent on a vector of
> > covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
> > This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
> > lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
> > fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
> > lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I
> assume
> > the regression coefficients are p-dimensional.
> >
> > I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
> > lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i)
> and
> > a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and
> > delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not
> > observed). I am convinced this is informative censoring because as long
> as
> > beta_T~=0 and beta_C~=0 then for each subject the data generating process
> > for T and C both depend on x.
> >
> > In contrast I generate non-informative censoring times
> > D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
> > and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed
> event),
> > and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event
> not
> > observed). Here beta_D is a scalar. I "scale" the simulation by choosing
> > the lambda_T, lambda_C and lambda_D parameters such that on average
> T_i<C_i
> > and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and
> > Y_ninf_i.
> >
> > The problem is that even for say 30% censoring (which I think is high),
> the
> > Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when
> I
> > expected the estimates using Y_inf to be biased, and I think I see why:
> > however different beta_C is from beta_T, a censored subject can
> presumably
> > influence the estimation of beta_T only by affecting the set of subjects
> at
> > risk at any time t, but this does not change the fact that every single
> > Y_inf_i with delta_inf_i=1 will have been generated using beta_T only.
> Thus
> > I do not see how my simulation can possibly produce biased estimates for
> > beta_T using Y_inf.
> >
> > But then what is informative censoring if not based on this approach?
> >
> > Any help would be greatly appreciated.
> >
> >         [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Gregory (Greg) L. Snow Ph.D.
> 538280 at gmail.com
>

	[[alternative HTML version deleted]]