[R] Re: cluster summary score

Frank E Harrell Jr fharrell at virginia.edu
Thu Aug 8 15:20:19 CEST 2002


This is confusing because if you do the variable clustering correctly, the cluster scores should be weakly correlated.  Check how you are doing the variable clustering and how you are interpreting measures of collinearity.
-Frank Harrell

On Thu, 8 Aug 2002 13:23:05 +0100
Huan Huang <huang at stats.ox.ac.uk> wrote:

> Dear Prof. Harrell and R list,
> 
> I have done the variable clustering and summary scores. Thanks a lot for
> your kind help.
> 
> But it hasn't solved the collinearity problem in my dataset. Afer the
> clustering and transcan, there is still very strong collinearity between the
> summary scores. The objective of my project is to find out the influential
> variables. I believe any variable resuction is not appropriate when the
> collinearity exists. I am thinking about the principal component regression
> and variable reduction based on it (Rudolf J. Freund and William J. Wilson
> (1998), P215).
> 
> Does anybody have suggestion on the variable resuction under this condition?
> I will appreciate any kind imformation.
> 
> Best
> 
> Huan
> ----- Original Message -----
> From: "Frank E Harrell Jr" <fharrell at virginia.edu>
> To: "Huan Huang" <huang at stats.ox.ac.uk>
> Sent: Sunday, August 04, 2002 7:56 PM
> Subject: Re: cluster summary score
> 
> 
> > On Sun, 4 Aug 2002 19:48:22 +0100
> > Huan Huang <huang at stats.ox.ac.uk> wrote:
> >
> > >
> > >
> > > > This was just done by
> > > >
> > > > f <- lrm(y ~ all cluster summary scores)
> > > > fastbw(f, suitable stopping criteria)
> > >
> > > Thank you very much for your kind reply. But I don't know how to get the
> > > cluster summary score.
> > >
> > > I did:
> > > t <- transcan(x, transform = T)
> > > t$transform
> > >
> > > I got a new matrix, with the transformed value for each variable. How
> can I
> > > get the cluster summary scores?
> >
> > You see the little pc1 function I defined in Hmisc?  I just do things like
> >
> > p1 <- pc1(t$transform) or pct1(t$transform[,c(3,5,7)]) to use variables
> 3,5,7
> >
> > Frank
> >
> > >
> > > Huan
> > >
> > > >
> > > > Doing the fast backward stepdown is safer with cluster scores than
> with
> > > raw variables, especially if you use conservative stopping criteria
> (e.g.,
> > > large alpha).  I allowed "highly insignificant" cluster scores to be
> > > dropped, and did not ever look at their component variables again.
> > > >
> > > > Frank
> > > >
> > > > >
> > > > > Actually I am doing  my thesis project. My explanatory variables
> have
> > > > > serious collinearity. I have used the function transcan and varclus
> on
> > > the
> > > > > variables and find out some clusters. I am trying to use the method
> > > > > introduced in this section to drop some variables. I want to know
> how
> > > you
> > > > > carry out the cluster summary scores.
> > > > >
> > > > > Thanks a lot and looking forward to hearing from you.
> > > > >
> > > > > Huan
> > > > > ----- Original Message -----
> > > > > From: "Frank E Harrell Jr" <fharrell at virginia.edu>
> > > > > To: <pmj at jciconsult.com>
> > > > > Cc: <r-help at stat.math.ethz.ch>
> > > > > Sent: Sunday, August 04, 2002 4:36 PM
> > > > > Subject: Re: [R] Pseudo R^2 for logit - really naive question
> > > > >
> > > > >
> > > > > > The Nagelkerke R^2 is commonly used.   The lrm function in the
> Design
> > > > > library computes this for logistic regression.  The numerator is 1 -
> > > > > exp(-LR/n) where LR is the likelihood ratio chi-square stat and n is
> the
> > > > > total sample size.  Divide it by the maximum attainable value of
> this if
> > > the
> > > > > model is perfect (which is a simple function of the -2 log
> likelihood
> > > with
> > > > > an intercept-only model) to get Nagelkerke's R^2.  The numerator is
> > > exactly
> > > > > the ordinary R^2 in OLS, as LR = -n log(1-R^2) there.  For a more
> > > > > interpretable index and one that measures purely discrimination
> ability,
> > > the
> > > > > ROC area or "C index" which is essentially a Mann-Whitney statistic
> > > based on
> > > > > concordance probability is recommended.  The lrm function also
> outputs
> > > this
> > > > > or you can get it from the somers2 or rcorr.cens functions in the
> Hmisc
> > > > > library.
> > > > > >
> > > > > > Frank Harrell
> > > > > >
> > > > > > On Sun, 4 Aug 2002 09:08:46 -0400
> > > > > > "Paul M. Jacobson" <pmj at jciconsult.com> wrote:
> > > > > >
> > > > > > > I am using GLM to calculate logit models based on
> cross-sectional
> > > data.
> > > > > I
> > > > > > > am now down to the hard work of making the results intelligible
> to
> > > very
> > > > > > > average readers.  Is there any way to calculate a psuedo
> analoque to
> > > the
> > > > > R^2
> > > > > > > in standard linear regression for use as a purely descriptive
> > > statistic
> > > > > of
> > > > > > > goodness of fit? Most of the readers of my report will be
> vaguely
> > > > > familiar
> > > > > > > and more comfortable with R^2 than with any other regression
> > > > > diagnostics.
> > > > > > >
> > > > > > > Paul M. Jacobson
> > > > > > > Jacobson Consulting Inc.
> > > > > > > 80 Front Street East, Suite 720
> > > > > > > Toronto, ON, M5E 1T4
> > > > > > > Voice:  +1(416)868-1141
> > > > > > > Farm: +1(519)463-6061/6224
> > > > > > > Fax: +1(416)868-1131
> > > > > > > E-mail: pmj at jciconsult.com
> > > > > > > Web:  http://www.jciconsult.com/
> > > > > > >
> > > > > >
> > > > >
> > >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > > > > -.-.-
> > > > > > > r-help mailing list -- Read
> > > > > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > > > > Send "info", "help", or "[un]subscribe"
> > > > > > > (in the "body", not the subject !)  To:
> > > r-help-request at stat.math.ethz.ch
> > > > > > >
> > > > >
> > >
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > > > > _._
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Frank E Harrell Jr              Prof. of Biostatistics &
> Statistics
> > > > > > Div. of Biostatistics & Epidem. Dept. of Health Evaluation
> Sciences
> > > > > > U. Virginia School of Medicine
> > > http://hesweb1.med.virginia.edu/biostat
> > > > >
> > >
> > -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
> > > > > -.-.-
> > > > > > r-help mailing list -- Read
> > > > > http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> > > > > > Send "info", "help", or "[un]subscribe"
> > > > > > (in the "body", not the subject !)  To:
> > > r-help-request at stat.math.ethz.ch
> > > > > >
> > > > >
> > >
> _._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
> > > > > _._
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> > > > Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> > > > U. Virginia School of Medicine
> http://hesweb1.med.virginia.edu/biostat
> > > >
> > >
> >
> >
> > --
> > Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> > Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> > U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
> >
> 


-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._



More information about the R-help mailing list