[R] Help understanding why glm and lrm.fit runs with my data, but lrm does not
Bonnett, Laura
L.J.Bonnett at liverpool.ac.uk
Mon Sep 18 14:46:19 CEST 2017
Many thanks for the assistance. I am using a small sample of GUSTO-1 as a teaching demonstration. The Gusto-1 dataset in various smaller subsets is available from this website: http://clinicalpredictionmodels.org/doku.php?id=rcode_and_data:start which is associated with the Clinical Prediction Models book by Steyerberg.
Many thanks again for your assistance.
Kind regards,
Laura
From: harrelfe at gmail.com [mailto:harrelfe at gmail.com] On Behalf Of Frank Harrell
Sent: 14 September 2017 17:22
To: David Winsemius <dwinsemius at comcast.net>
Cc: Bonnett, Laura <ljbcmshe at liverpool.ac.uk>; r-help at r-project.org
Subject: Re: [R] Help understanding why glm and lrm.fit runs with my data, but lrm does not
Fixed 'maxiter' in the help file. Thanks.
Please give the original source of that dataset.
That dataset is a tiny sample of GUSTO-I and not large enough to fit this model very reliably.
A nomogram using the full dataset (not publicly available to my knowledge) is already available in http://biostat.mc.vanderbilt.edu/tmp/bbr.pdf
Use lrm, not lrm.fit for this. Adding maxit=20 will probably make it work on the small dataset but still not clear on why you are using this dataset.
Frank
________________________________
Frank E Harrell Jr
Professor
School of Medicine
Department of Biostatistics
Vanderbilt University
On Thu, Sep 14, 2017 at 10:48 AM, David Winsemius <dwinsemius at comcast.net<mailto:dwinsemius at comcast.net>> wrote:
> On Sep 14, 2017, at 12:30 AM, Bonnett, Laura <L.J.Bonnett at liverpool.ac.uk<mailto:L.J.Bonnett at liverpool.ac.uk>> wrote:
>
> Dear all,
>
> I am using the publically available GustoW dataset. The exact version I am using is available here: https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdrive.google.com%2Fopen%3Fid%3D0B4oZ2TQA0PAoUm85UzBFNjZ0Ulk&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=UZgX3%2Ba%2FU2Eeh8ybHMI6JnF0Npd2XJPXAzlmtEhDgOY%3D&reserved=0
>
> I would like to produce a nomogram for 5 covariates - AGE, HYP, KILLIP, HRT and ANT. I have successfully fitted a logistic regression model using the "glm" function as shown below.
>
> library(rms)
> gusto <- spss.get("GustoW.sav")
> fit <- glm(DAY30~AGE+HYP+factor(KILLIP)+HRT+ANT,family=binomial(link="logit"),data=gusto,x=TRUE,y=TRUE)
>
> However, my review of the literature and other websites suggest I need to use "lrm" for the purposes of producing a nomogram. When I run the command using "lrm" (see below) I get an error message saying:
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> Unable to fit model using "lrm.fit"
>
> My code is as follows:
> gusto2 <- gusto[,c(1,3,5,8,9,10)]
> gusto2$HYP <- factor(gusto2$HYP, labels=c("No","Yes"))
> gusto2$KILLIP <- factor(gusto2$KILLIP, labels=c("1","2","3","4"))
> gusto2$HRT <- factor(gusto2$HRT, labels=c("No","Yes"))
> gusto2$ANT <- factor(gusto2$ANT, labels=c("No","Yes"))
> var.labels=c(DAY30="30-day Mortality", AGE="Age in Years", KILLIP="Killip Class", HYP="Hypertension", HRT="Tachycardia", ANT="Anterior Infarct Location")
> label(gusto2)=lapply(names(var.labels),function(x) label(gusto2[,x])=var.labels[x])
>
> ddist = datadist(gusto2)
> options(datadist='ddist')
>
> fit1 <- lrm(DAY30~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Error in lrm(DAY30 ~ AGE + HYP + KILLIP + HRT + ANT, gusto2) :
> Unable to fit model using "lrm.fit"
>
> Online solutions to this problem involve checking whether any variables are redundant. However, the results for my data suggest that none are.
> redun(~AGE+HYP+KILLIP+HRT+ANT,gusto2)
>
> Redundancy Analysis
>
> redun(formula = ~AGE + HYP + KILLIP + HRT + ANT, data = gusto2)
>
> n: 2188 p: 5 nk: 3
>
> Number of NAs: 0
>
> Transformation of target variables forced to be linear
>
> R-squared cutoff: 0.9 Type: ordinary
>
> R^2 with which each variable can be predicted from all other variables:
>
> AGE HYP KILLIP HRT ANT
> 0.028 0.032 0.053 0.046 0.040
>
> No redundant variables
>
> I've also tried just considering "lrm.fit" and that code seems to run without error too:
> lrm.fit(cbind(gusto2$AGE,gusto2$KILLIP,gusto2$HYP,gusto2$HRT,gusto2$ANT),gusto2$DAY30)
>
> Logistic Regression Model
>
> lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
> gusto2$ANT), y = gusto2$DAY30)
>
> Model Likelihood Discrimination Rank Discrim.
> Ratio Test Indexes Indexes
> Obs 2188 LR chi2 233.59 R2 0.273 C 0.846
> 0 2053 d.f. 5 g 1.642 Dxy 0.691
> 1 135 Pr(> chi2) <0.0001 gr 5.165 gamma 0.696
> max |deriv| 4e-09 gp 0.079 tau-a 0.080
> Brier 0.048
>
> Coef S.E. Wald Z Pr(>|Z|)
> Intercept -13.8515 0.9694 -14.29 <0.0001
> x[1] 0.0989 0.0103 9.58 <0.0001
> x[2] 0.9030 0.1510 5.98 <0.0001
> x[3] 1.3576 0.2570 5.28 <0.0001
> x[4] 0.6884 0.2034 3.38 0.0007
> x[5] 0.6327 0.2003 3.16 0.0016
>
> I was therefore hoping someone would explain why the "lrm" code is producing an error message, while "lrm.fit" and "glm" do not. In particular I would welcome a solution to ensure I can produce a nomogram.
Try this:
lrm # look at code, do a search on "fail"
?lrm.fit # read the structure of the returned value of lrm.fit
my.fit <- lrm.fit(x = cbind(gusto2$AGE, gusto2$KILLIP, gusto2$HYP, gusto2$HRT,
gusto2$ANT), y = gusto2$DAY30)
print(my.fit$fail) # the error message you got from the lrm call means convergence failed
Documentation bug: The documentation of the cause of the 'fail'- value incorrectly gives the name of this parameter as 'maxiter' in the Value section.
--
David.
>
> Kind regards,
> Laura
>
> Dr Laura Bonnett
> NIHR Post-Doctoral Fellow
>
> Department of Biostatistics,
> Waterhouse Building, Block F,
> 1-5 Brownlow Street,
> University of Liverpool,
> Liverpool,
> L69 3GL
>
> 0151 795 9686
> L.J.Bonnett at liverpool.ac.uk<mailto:L.J.Bonnett at liverpool.ac.uk>
>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org<mailto:R-help at r-project.org> mailing list -- To UNSUBSCRIBE and more, see
> https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=GAPis8GXCfundLz48dX66AZfVTzxs%2BNBUmG1kgpx2Ro%3D&reserved=0
> PLEASE do read the posting guide https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.R-project.org%2Fposting-guide.html&data=02%7C01%7Cf.harrell%40vanderbilt.edu%7Cadb58b13c3994f89209708d4fb8807f0%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636410009046132507&sdata=C8xd7UizYeLM6bylOyad8bumQTsYOzFYZu2IcMo%2BUII%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law
[[alternative HTML version deleted]]
More information about the R-help
mailing list