[R] survival analysis: interval censored data
Terry Therneau
therneau at mayo.edu
Wed Sep 28 23:03:17 CEST 2011
David,
Thanks for taking time to answer this.
When there are interval censored data, computation of the MLE first
finds a minimal number of time points such that every interval with a
death contains one of those points. Those are the points at which the
final curve can have a jump. The algorithm, more or less is:
1. On the real line, put a pair of parenthesis down for each
interval with an event "(" at the left end ")" at the right.
2. Now scan down the line for () pairs (pairs with nothing else in
the enclosed interval): the center of each will be one of the jump
points.
With slight variations for left truncation.
Terry T.
On Wed, 2011-09-28 at 16:33 -0400, David Winsemius wrote:
> On Sep 28, 2011, at 10:56 AM, Ruth Arias wrote:
>
> >
> >
> > hallo terry:
> >
> > I attached araceae data set,
>
> The usual survival analysis via the Kaplan-Meier method only make
> estimates at the time of events. When you tabulate your data, you see
> that there were no events for the missing (starting) "time" rows in
> those categories during the intervals that you are questioning as
> missing:
>
> xtabs( ~ time+time2+categoria+event, data=araceae)
> , , categoria = C, event = 0
>
> time2
> time 2005 2006 2007 2008 2009 2010
> 2004 0 23 1 3 1 22
> 2005 0 0 0 0 0 0
> 2007 0 0 0 0 4 19
> 2008 0 0 0 0 0 0
> 2009 0 0 0 0 0 0
>
> , , categoria = E, event = 0
>
> time2
> time 2005 2006 2007 2008 2009 2010
> 2004 0 22 0 7 3 21
> 2005 0 0 1 1 0 0
> 2007 0 0 0 0 0 29
> 2008 0 0 0 0 0 0
> 2009 0 0 0 0 0 1
>
>
> , , categoria = C, event = 1
>
> time2
> time 2005 2006 2007 2008 2009 2010
> 2004 0 5 2 3 0 3
> 2005 0 0 0 0 0 0
> 2007 0 0 0 2 3 2
> 2008 0 0 0 0 1 0
> 2009 0 0 0 0 0 0
>
> , , categoria = E, event = 1
>
> time2
> time 2005 2006 2007 2008 2009 2010
> 2004 7 2 1 1 3 4
> 2005 0 0 0 1 0 0
> 2007 0 0 0 3 1 3
> 2008 0 0 0 0 0 0
> 2009 0 0 0 0 0 0
>
> >
> > when I use this:
> >
> > surara<-survfit(Surv(time,time2,event)~categoria)
> >
> > Call: survfit(formula = Surv(time, time2, event) ~ categoria)
> >
> > records n.max n.start events median 0.95LCL 0.95UCL
> > categoria=C 94 63 0 21 NA NA NA
> > categoria=E 111 77 0 26 NA NA NA
> >> summary(surara)
> > Call: survfit(formula = Surv(time, time2, event) ~ categoria)
> >
> > categoria=C
> > time n.risk n.event entered censored survival std.err lower 95% CI
> > upper 95% CI
> > 2006 63 5 0 23 0.921 0.0341
> > 0.856 0.990
> > 2007 35 2 30 1 0.868 0.0483
> > 0.778 0.968
> > 2008 62 5 1 3 0.798 0.0536
> > 0.700 0.910
> > 2009 55 4 0 5 0.740 0.0570
> > 0.636 0.861
> > 2010 46 5 0 41 0.660 0.0611
> > 0.550 0.791
> >
> > categoria=E
> > time n.risk n.event entered censored survival std.err lower 95% CI
> > upper 95% CI
> > 2005 71 7 3 0 0.901 0.0354
> > 0.835 0.973
> > 2006 67 2 0 22 0.875 0.0391
> > 0.801 0.955
> > 2007 43 1 36 1 0.854 0.0432
> > 0.774 0.943
> > 2008 77 5 0 8 0.799 0.0469
> > 0.712 0.896
> > 2009 64 4 1 3 0.749 0.0502
> > 0.657 0.854
> > 2010 58 7 0 51 0.658 0.0545
> > 0.560 0.774
>
> You see that your first survfit object is offering a simple sum of
> 'time2' columns of that tabulation as its 'n.event' values. It's
> 'n.risk' tabulation is not taking note of whether a case started in
> any particular prior interval. The n.risk sum appears to be the sum of
> persons surviving from the prior year less any decedents plus any
> entrants as reflected in "future" events on that row You notice
> that there are missing years even in that report: 2004,2005 for
> category C and 2004 for category E since there are no events in
> columns for those 'time2' values.
>
> >
> > but whe I included type=interval,
> >
> >> suraraint<-
> >> survfit(Surv(time,time2,event,type='interval')~categoria) # falta
> >> arreglar lo del intervalo!!!
> >> summary(suraraint)
> > Call: survfit(formula = Surv(time, time2, event, type = "interval") ~
> > categoria)
> >
> > categoria=C
> > time n.risk n.event survival std.err lower 95% CI upper 95% CI
> > 2004 95.00 13.14 0.862 0.0354 0.795 0.934
> > 2007 31.86 7.19 0.667 0.0695 0.544 0.818
> > 2008 1.67 1.67 0.000 NaN NA NA
> >
> > categoria=E
> > time n.risk n.event survival std.err lower 95% CI upper 95% CI
> > 2004 112.0 18.47 0.835 0.0351 0.769 0.907
> > 2005 40.5 1.06 0.813 0.0401 0.738 0.896
> > 2007 37.5 7.46 0.651 0.0620 0.540 0.785
>
>
> The second object's n.event, when Surv() was constructed with
> type="interval", has values based on the starting 'time' rows, but I
> am unable to deduce the estimating algorithm. I remember Therneau
> saying it wasn't a simple algorithm. The 2008 row in category C has
> one entry of 1 in the next year and there were no censoring for C-
> entrants in that year. Why the n.event is 1.67 I cannot say, but at
> least the n.event does not exceed the n.risk. The code or a copy of
> Therneau and Grambsch would be sensible places to look for answer by
> my initial efforts in those direction have not illuminated me.
>
More information about the R-help
mailing list