[R] Problem comparing hazard ratios
Frank E Harrell Jr
f.harrell at Vanderbilt.Edu
Thu Apr 1 00:15:20 CEST 2010
P.S. Here is some code that is more directly relevant.
Frank
cphadjchisq <- function(adjust, candidates, S)
{
require(rms)
f <- coxphFit(adjust, S, type='right', method='efron')
if(f$fail) stop('could not fit a model with just adjust')
ll2 <- -2*f$loglik[2]
chisq.adjust <- 2*diff(f$loglik)
p <- ncol(candidates)
chisq <- numeric(p)
names(chisq) <- colnames(candidates)
for(i in 1:p)
{
f <- coxphFit(cbind(adjust, candidates[,i]), S,
type='right', method='efron')
chisq[i] <- if(f$fail) NA else ll2 + 2*f$loglik[2]
}
list(chisq=chisq, P=1 - pchisq(chisq, 1), chisq.adjust=chisq.adjust,
padjust=ncol(adjust), pcandidates=ncol(candidates))
}
bootrankcph <- function(adjust, candidates, S, B=500, conf.int=0.95,
pr=TRUE)
{
adjust <- as.matrix(adjust)
candidates <- as.matrix(candidates)
i <- is.na(rowSums(adjust) + rowSums(candidates) + rowSums(S))
if(any(i))
{
i <- !i
adjust <- adjust[i,,drop=FALSE]
candidates <- candidates[i,,drop=FALSE]
S <- S[i,,drop=FALSE]
}
n <- nrow(S)
orig <- cphadjchisq(adjust, candidates, S)
orig.rank <- rank(-orig$chisq)
p <- ncol(candidates)
brank <- matrix(NA, ncol=p, nrow=B)
for(i in 1:B)
{
if(pr) cat(i, '\r')
s <- sample(1:n, n, replace=TRUE)
w <- cphadjchisq(adjust[s,,drop=FALSE], candidates[s,,drop=FALSE],
S[s,,drop=FALSE])
brank[i,] <- rank(-w$chisq)
}
if(pr) cat('\n')
lower <- apply(brank, 2, function(w) quantile(w, (1-conf.int)/2,
na.rm=TRUE))
upper <- apply(brank, 2, function(w) quantile(w, (1+conf.int)/2,
na.rm=TRUE))
structure(list(stats=orig,
ranks=cbind(rank=orig.rank, lower=lower, upper=upper)),
class='bootrankcph')
}
print.bootrankcph <- function(x, ...)
{
s <- x$stats
cat('\nOriginal Model L.R. Statistic for Adjustment Variables\n\n',
'Chi-square=', round(s$chisq.adjust,2),'with', s$padjust,
'd.f.\n\nOriginal Partial L.R. Statistics for Candidates (1
d.f.)\n\n')
i <- order(s$chisq)
print(cbind('chi-square'=round(s$chisq[i],2),
P=format.pval(s$P[i], digits=3, eps=.001)),
quote=FALSE)
invisible()
}
plot.bootrankcph <- function(x, ...)
{
x <- x$ranks
cand <- if(length(rownames(x))) as.factor(rownames(x)) else
as.factor(1:nrow(x))
cand <- reorder.factor(cand, x[,1])
Dotplot(cand ~ Cbind(x), pch=16, xlab='Rank', ylab='Candidate')
}
require(rms)
n <- 100
age <- rnorm(n, 50, 10)
sex <- sample(c('female','male'), n, TRUE)
adjust <- model.matrix(~ rcs(age,3)*sex)
candidates <- matrix(rnorm(n*50), ncol=50,
dimnames=list(NULL, c(letters,LETTERS)[1:50]))
S <- Surv(runif(n))
w <- bootrankcph(adjust, candidates, S, B=100)
w$stats
plot(w)
Frank E Harrell Jr wrote:
> Ravi Varadhan wrote:
>> Frank,
>>
>> Is there an article that discusses this idea of bootstrapping the
>> ranks of
>> the likelihood ratio chi-square Statistics to assess relative
>> importance of predictors in time-to-event data
>> (specifically Cox PH model)?
>> Thanks,
>> Ravi.
>
> Do require(rms); ?anova.rms and see related articles:
>
> @Article{hal09usi,
> author = {Hall, Peter and Miller, Hugh},
> title = {Using the bootstrap to quantify the authority of an
> empirical ranking},
> journal = Annals of Stat,
> year = 2009,
> volume = 37,
> number = {6B},
> pages = {3929-3959},
> annote = {confidence interval for ranks;genomics;high
> dimension;independent component bootstrap;$m$-out-of-$n$
> bootstrap;ordering;overlap interval;prediction interval;synchronous
> bootstrap;ordinary bootstrap may not provide accurate confidence
> intervals for ranks;may need a different bootstrap if the number of
> parameters being ranked increases with $n$ or is large;estimating $m$ is
> difficult;in their first example, where $m=0.355n$, the ordinary
> bootstrap provided a lower bound to the lengths of more accurate
> confidence intervals of ranks}
> }
>
> @Article{xie09con,
> author = {Xie, Minge and Singh, Kesar and Zhang, {Cun-Hui}},
> title = {Confidence intervals for population ranks in the
> presence of ties and near ties},
> journal = JASA,
> year = 2009,
> volume = 104,
> number = 486,
> pages = {775-787},
> annote = {bootstrap ranks;ranking;nonstandard bootstrap
> inference;rank inference;slow convergence rate;smooth ranks in the
> presence of near ties;rank inference for fixed effects risk adjustment
> models}
> }
>
>>
>> -----Original Message-----
>> From: r-help-bounces at r-project.org
>> [mailto:r-help-bounces at r-project.org] On
>> Behalf Of Frank E Harrell Jr
>> Sent: Tuesday, March 30, 2010 3:57 PM
>> To: Michal Figurski
>> Cc: r-help at r-project.org
>> Subject: Re: [R] Problem comparing hazard ratios
>>
>> Michal Figurski wrote:
>>> Dear R-Helpers,
>>>
>>> I am a novice in survival analysis. I have the following code:
>>> for (i in 3:12) print(coxph(Surv(time, status)~a[,i], data=a))
>>>
>>> I used it to fit the Cox Proportional Hazard models separately for
>>> every available parameter (columns 3:12) in my data set - with
>>> intention to compare the Hazard Ratios.
>>>
>>> However, some of my variables are in range 0.1 to 1.6, others in
>>> range 5000 to 9000. How do I compare HRs between such variables?
>>>
>>> I have rescaled all the variables to be in 0 to 1 range - is this the
>>> proper way to go? Is there a way to somehow calculate the same HRs
>>> (as for rescaled parameters) from the HRs for original parameters?
>>>
>>> Many thanks in advance.
>>>
>>
>> There are a lot of issues related to this that will require a good bit
>> of study, both in survival analysis and in regression. I would start
>> with bootstrapping the ranks of the likelihood ratio chi-square
>> statistics of the competing biomarkers.
>>
>> Frank
>>
>
>
--
Frank E Harrell Jr Professor and Chairman School of Medicine
Department of Biostatistics Vanderbilt University
More information about the R-help
mailing list