[R] Nesting in Cox proportional hazards survivorship analysis

Greg Snow Greg.Snow at intermountainmail.org
Thu Jun 1 00:31:26 CEST 2006

The site/station syntax is mainly useful for situations where you have
the same station id's withing different sites (e.g. there is a station
#1 in site #1 and also a different station still labeled #1 within site
#2).  It is unclear whether you really need the /station part or not (it
generally does not hurt if you include it and it is not needed).  To
take into account the heterogenity you can add +cluster(site) or
+cluster(station) to the model to tell it that there is correlation
within sites or stations (see the help on cluster and possibly on

Hope this helps,

Gregory (Greg) L. Snow Ph.D.
Statistical Data Center
Intermountain Healthcare
greg.snow at intermountainmail.org
(801) 408-8111

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Jessica M Pearce
Sent: Wednesday, May 31, 2006 9:40 AM
To: r-help at stat.math.ethz.ch
Subject: [R] Nesting in Cox proportional hazards survivorship analysis

My advisor and I have been working on some survivorship analyses in R
and we are hoping to get some feedback on a particular issue involving
We are interested in patterns of food discovery by ant species. Our
observations consist of time to discovery by an ant for three different
food types, each of two different sizes. These data were collected at 6
plots located in each of two states. Every plot is divided into 25
stations, at which the observations were made. In a repeated measures
style design, all stations received all levels of food type and size
over the course of 6 sampling periods. So multiple measurements are
drawn from each station and site; however, each individual bait item is
only discovered once. We also have vapor pressure deficit measurements
(a measure that combines temperature and relative humidity) for each
discovery time. Each state is being analyzed separately and we are using
the Cox proportional hazards approach.
It is clear from preliminary analysis that there is a strong influence
of spatial heterogenity as evidenced by significant contributions of
stations and plots to discovery. However, we are not necessarily
interested in the details of this heterogenity and simply wish to
control for it in examining the other factors of the model. Thus we
employed what we think to be the appropriate nesting syntax in the model
we are running (as gleaned from Venables and Ripley 1999, 3rd edition),
with stations being nested within sites.
To provide an example of the syntax, the full model with which we began
TXa <- coxph(Surv(dt, status)~site/station+foodtype*foodsize+vpd,
This obviously generates a large number of terms, even as we work down
to the reduced model.
Is this syntax testing what we think it is testing, i.e. are we
controlling for station effects in our results? Are there potential
problems with our approach to the analysis of which we should be aware
in our interpretation? We have looked at many sources of survivorship
analysis literature and haven't seen this nesting issue discussed,
besides briefly in Venables and Ripley. We recognize that this is an
unusual use of survivorship analysis and would appreciate any insight
Jessica Pearce
Biology Department
University of Utah
Salt Lake City, UT

	[[alternative HTML version deleted]]

More information about the R-help mailing list