[R] Survival analysis
Daniel Malter
daniel at umd.edu
Wed Feb 17 21:53:22 CET 2010
There a numerous issues, some of which David has pointed out. I will add some and address some:
1. As far as I understand, you look at only one population. For a survival model, you would need an indicator when the species was extinguished (rather than a probability). However, with only one extinguishing point in time, this model is nonsense.
2. Your dependent variable, however, is decline (or rather probably a prediction of the percentage of the existing population relative to its baseline at date t=0; that would be my guess). Echoing David, what was this logistic regression (what was the model)? Is this derived from a count of the animals in each time period? You may create all sorts of issues by doing that (issues that can bias your result) and be better off by working on the original data. Please provide us with more info on your dependent variable and what this logistic regression was.
3. Your current dependent variable has time-series nature. So you may be facing autocorrelation of the error term among observations. My best guess is that you better model this as a time series, but again, we need more information.
4. As for the missing variables. There are several ways to address this issue. 1st. Imputation (this is probably not the right way to go, when large amounts of data are missing, and there is a host of literature on imputation). 2nd. Missing variable coding (You create a second variable, a missing-value indicator, for each variable that contains NAs. The missing variable indicator you code 1 if the underlying variable is NA and 0 if the underlying variable has a numeric value. All NAs in the underlying variable you recode to 0.)
Example for missing variable coding (Oxygen = variable with NAs, Recoded = recoded oxygen variable, MVI = missing variable indicator
Oxygen Recoded MVI
3 3 0
5 5 0
NA 0 1
NA 0 1
6 6 0
NA 0 1
4 0 0
If the data is missing at random, the coefficient on the MVI indicator should be insignificant. If it comes out significant, it will tell you that something about obs for which your data is missing is different than for the year for which you have observed the independent variables. But that requires us to figure out which model to use in the first place.
Best,
Daniel
-------------------------
cuncta stricte discussurus
-------------------------
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of FishR
Sent: Wednesday, February 17, 2010 1:55 PM
To: r-help at r-project.org
Subject: [R] Survival analysis
Dear all
I have a dataset examining the probability of a population surviving
(calculated from a logistic regression) of a species over a 200yr period.
The predictor variables are either continuous but non-normal (e.g.
temperature, oxygen) or categorical (e.g. channelisation), unfortunately I
also have a large amount of missing values.
Year Decline Temperature Oxygen Channelisation
1800 0.947758115 36.6 NA NA
1801 0.946135961 25.2 NA NA
1802 0.944466388 28.5 NA NA
1803 0.942748196 35.5 NA NA
1804 0.940980166 33 NA NA
1805 0.93916106 30.2 NA NA
truncated …
1999 0.028531339 10.5 NA 5
2000 0.027649801 8.4 NA 5
I have been trying to run a Cox Proportional Hazards Model with the code
model<-coxph(Surv(Year, Decline) ~ Temperature + Oxygen + Channelisation)
but keep getting an error message ‘Invalid status value’.
Have I inputted the data in the wrong format or am I trying to run a totally
unsuitable model?
Any help would be greatly appreciated
Tom
--
View this message in context: http://n4.nabble.com/Survival-analysis-tp1559155p1559155.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list