[R] How to simulate informative censoring in a Cox PH model?
Daniel Meddings
dpmeddings at gmail.com
Wed Jul 22 10:20:47 CEST 2015
I wish to simulate event times where the censoring is informative, and to
compare parameter estimator quality from a Cox PH model with estimates
obtained from event times generated with non-informative censoring. However
I am struggling to do this, and I conclude rather than a technical flaw in
my code I instead do not understand what is meant by informative and
un-informative censoring.
My approach is to simulate an event time T dependent on a vector of
covariates x having hazard function h(t|x)=lambda*exp(beta'*x)v*t^{v-1}.
This corresponds to T~ Weibull(lambda(x),v), where the scale parameter
lambda(x)=lambda*exp(beta'*x) depends on x and the shape parameter v is
fixed. I have N subjects where T_{i}~ Weibull(lambda(x_{i}),v_{T}),
lambda(x_{i})=lambda_{T}*exp(beta_{T}'*x_{i}), for i=1,...,N. Here I assume
the regression coefficients are p-dimensional.
I generate informative censoring times C_i~ Weibull(lambda(x_i),v_C),
lambda(x_i)=lambda_C*exp(beta_C'*x_i) and compute Y_inf_i=min(T_i,C_i) and
a censored flag delta_inf_i=1 if Y_inf_i <= C_i (an observed event), and
delta_inf_i=0 if Y_inf_i > C_i (informatively censored: event not
observed). I am convinced this is informative censoring because as long as
beta_T~=0 and beta_C~=0 then for each subject the data generating process
for T and C both depend on x.
In contrast I generate non-informative censoring times
D_i~Weibull(lambda_D*exp(beta_D),v_D), and compute Y_ninf_i=min(T_i,D_i)
and a censored flag delta_ninf_i=1 if Y_ninf_i <= D_i (an observed event),
and delta_ninf_i=0 if Y_ninf_i > D_i (non-informatively censored: event not
observed). Here beta_D is a scalar. I "scale" the simulation by choosing
the lambda_T, lambda_C and lambda_D parameters such that on average T_i<C_i
and T_i<D_i to achieve X% of censored subjects for both Y_inf_i and
Y_ninf_i.
The problem is that even for say 30% censoring (which I think is high), the
Cox PH parameter estimates using both Y_inf and Y_ninf are unbiased when I
expected the estimates using Y_inf to be biased, and I think I see why:
however different beta_C is from beta_T, a censored subject can presumably
influence the estimation of beta_T only by affecting the set of subjects at
risk at any time t, but this does not change the fact that every single
Y_inf_i with delta_inf_i=1 will have been generated using beta_T only. Thus
I do not see how my simulation can possibly produce biased estimates for
beta_T using Y_inf.
But then what is informative censoring if not based on this approach?
Any help would be greatly appreciated.
[[alternative HTML version deleted]]
More information about the R-help
mailing list