[R] Cox model -missing data.
aoife doherty
aoife.m.doherty at gmail.com
Fri Dec 19 12:17:27 CET 2014
Many thanks, I appreciate the response.
When I convert the missing values to NA and run the cox model as described
in previous post, the cox model seems to remove all of the rows with a
missing value (as the number of rows "n" in the cox output after I
completely remove any row with missing data is the same as the number of
rows "n" in the cox output after I change the missing values to NA).
What I had been hoping to do is not completely remove a row with missing
data for a co-variable, but rather somehow censor or estimate a value for
the missing value?
In reality, I have ~600 people with survival data and say 6 variables
attached to them. After I incorporate a 7th variable (for which the
information isn't available for every individual), I have 400 people left.
Since I still have survival data and almost all of the information for the
other 200 people (the only thing missing is information about that 7th
variable), it seems a waste to remove all of the survival data for 200
people over one co-variate. So I was hoping instead of completely removing
the rows, to just somehow acknowledge that the data for this particular
co-variate is missing in the model but not completely remove the row? This
is more what I was hoping someone would know if it's possible to
incorporate into the model I described above?
Thanks
On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding <Ted.Harding at wlandres.net>
wrote:
>
> Hi Aoife,
> I think that if you simply replace each "*" in the data file
> with "NA", then it should work ("NA" is usually interpreted
> as "missing" for those functions for which missingness is
> relevant). How you subsequently deal with records which have
> missing values is another question (or many questions ... ).
>
> So your data should look like:
>
> V1 V2 V3 Survival Event
> ann 13 WTHomo 4 1
> ben 20 NA 5 1
> tom 40 Variant 6 1
>
> Hoping this helps,
> Ted.
>
> On 19-Dec-2014 10:12:00 aoife doherty wrote:
> > Hi all,
> >
> > I have a data set like this:
> >
> > Test.cox file:
> >
> > V1 V2 V3 Survival Event
> > ann 13 WTHomo 4 1
> > ben 20 * 5 1
> > tom 40 Variant 6 1
> >
> >
> > where "*" indicates that I don't know what the value is for V3 for Ben.
> >
> > I've set up a Cox model to run like this:
> >
> >#!/usr/bin/Rscript
> > library(bdsmatrix)
> > library(kinship2)
> > library(survival)
> > library(coxme)
> > death.dat <- read.table("Test.cox",header=T)
> > deathdat.kmat <-2*with(death.dat,makekinship(famid,ID,faid,moid))
> > sink("Test.cox.R.Output")
> > Model <- coxme(Surv(Survival,Event)~ strata(factor(V1)) +
> > strata(factor(V2)) + factor(V3)) +
> > (1|ID),data=death.dat,varlist=deathdat.kmat)
> > Model
> > sink()
> >
> >
> >
> > As you can see from the Test.cox file, I have a missing value "*". How
> and
> > where do I tell the R script "treat * as a missing variable". If I can't
> > incorporate missing values into the model, I assume the alternative is to
> > remove all of the rows with missing data, which will greatly reduce my
> data
> > set, as most rows have at least one missing variable.
> >
> > Thanks
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> -------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at wlandres.net>
> Date: 19-Dec-2014 Time: 10:21:23
> This message was sent by XFMail
> -------------------------------------------------
>
[[alternative HTML version deleted]]
More information about the R-help
mailing list