[R] [correction] Animal Morphology: Deriving Classification Equation with
cdm
mendenhallchase at gmail.com
Sun May 24 23:20:09 CEST 2009
Ted,
I just ran everything using the log of all variables. Much better analysis
and it doesn't violate the assumptions.
I'm still in the dark concerning the classification equation- other than the
fact that it now will contain log functions.
Thank you for you help,
Chase
Ted.Harding-2 wrote:
>
> [Apologies -- I made an error (see at [***] near the end)]
>
> On 24-May-09 19:07:46, Ted Harding wrote:
>> [Your data and output listings removed. For comments, see at end]
>>
>> On 24-May-09 13:01:26, cdm wrote:
>>> Fellow R Users:
>>> I'm not extremely familiar with lda or R programming, but a recent
>>> editorial review of a manuscript submission has prompted a crash
>>> course. I am on this forum hoping I could solicit some much needed
>>> advice for deriving a classification equation.
>>>
>>> I have used three basic measurements in lda to predict two groups:
>>> male and female. I have a working model, low Wilk's lambda, graphs,
>>> coefficients, eigenvalues, etc. (see below). I adjusted the sample
>>> analysis for Fisher's or Anderson's Iris data provided in the MASS
>>> library for my own data.
>>>
>>> My final and last step is simply form the classification equation.
>>> The classification equation is simply using standardized coefficients
>>> to classify each group- in this case male or female. A more thorough
>>> explanation is provided:
>>>
>>> "For cases with an equal sample size for each group the classification
>>> function coefficient (Cj) is expressed by the following equation:
>>>
>>> Cj = cj0+ cj1x1+ cj2x2+...+ cjpxp
>>>
>>> where Cj is the score for the jth group, j = 1 ⦠k, cjo is the
>>> constant for the jth group, and x = raw scores of each predictor.
>>> If W = within-group variance-covariance matrix, and M = column matrix
>>> of means for group j, then the constant cjo= (-1/2)CjMj" (Julia
>>> Barfield, John Poulsen, and Aaron French
>>> http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discriminant.htm).
>>>
>>> I am unable to navigate this last step based on the R output I have.
>>> I only have the linear discriminant coefficients for each predictor
>>> that would be needed to complete this equation.
>>>
>>> Please, if anybody is familiar or able to to help please let me know.
>>> There is a spot in the acknowledgments for you.
>>>
>>> All the best,
>>> Chase Mendenhall
>>
>> The first thing I did was to plot your data. This indicates in the
>> first place that a perfect discrimination can be obtained on the
>> basis of your variables WRMA_WT and WRMA_ID alone (names abbreviated
>> to WG, WT, ID, SEX):
>>
>> d.csv("horsesLDA.csv")
>> # names(D0) # "WRMA_WG" "WRMA_WT" "WRMA_ID" "WRMA_SEX"
>> WG<-D0$WRMA_WG; WT<-D0$WRMA_WT;
>> ID<-D0$WRMA_ID; SEX<-D0$WRMA_SEX
>>
>> ix.M<-(SEX=="M"); ix.F<-(SEX=="F")
>>
>> ## Plot WT vs ID (M & F)
>> plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>> points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>> points(ID[ix.F],WT[ix.F],pch="+",col="red")
>> lines(ID,15.5-1.0*(ID))
>>
>> and that there is a lot of possible variation in the discriminating
>> line WT = 15.5-1.0*(ID)
>>
>> Also, it is apparent that the covariance between WT and ID for Females
>> is different from the covariance between WT and ID for Males. Hence
>> the assumption (of common covariance matrix in the two groups) for
>> standard LDA (which you have been applying) does not hold.
>>
>> Given that the sexes can be perfectly discriminated within the data
>> on the basis of the linear discriminator (WT + ID) (and others),
>> the variable WG is in effect a close approximation to noise.
>>
>> However, to the extent that there was a common covariance matrix
>> to the two groups (in all three variables WG, WT, ID), and this
>> was well estimated from the data, then inclusion of the third
>> variable WG could yield a slightly improved discriminator in that
>> the probability of misclassification (a rare event for such data)
>> could be minimised. But it would not make much difference!
>>
>> However, since that assumption does not hold, this analysis would
>> not be valid.
>>
>> If you plot WT vs WG, a common covariance is more plausible; but
>> there is considerable overlap for these two variables:
>>
>> plot(WG,WT)
>> points(WG[ix.M],WT[ix.M],pch="+",col="blue")
>> points(WG[ix.F],WT[ix.F],pch="+",col="red")
>>
>> If you plot WG vs ID, there is perhaps not much overlap, but a
>> considerable difference in covariance between the two groups:
>>
>> plot(ID,WG)
>> points(ID[ix.M],WG[ix.M],pch="+",col="blue")
>> points(ID[ix.F],WG[ix.F],pch="+",col="red")
>>
>> This looks better on a log scale, however:
>>
>> lWG <- log(WG) ; lWT <- log(WT) ; lID <- log(ID)
>>## Plot log(WG) vs log(ID) (M & F)
>> plot(lID,lWG)
>> points(lID[ix.M],lWG[ix.M],pch="+",col="blue")
>> points(lID[ix.F],lWG[ix.F],pch="+",col="red")
>>
>> and common covaroance still looks good for WG vs WT:
>>
>> ## Plot log(WT) vs log(WG) (M & F)
>> plot(lWG,lWT)
>> points(lWG[ix.M],lWT[ix.M],pch="+",col="blue")
>> points(lWG[ix.F],lWT[ix.F],pch="+",col="red")
>>
>> but there is no improvement for WG vs IG:
>>
>> ## Plot log(WT) vs log(ID) (M & F)
>> plot(ID,WT,xlim=c(0,12),ylim=c(8,15))
>> points(ID[ix.M],WT[ix.M],pch="+",col="blue")
>> points(ID[ix.F],WT[ix.F],pch="+",col="red")
>
> [***]
> The above is incorrect! Apologies. I plotted the raw WT and ID
> instead of their logs. In fact, if you do plot the logs:
>
> ## Plot log(WT) vs log(ID) (M & F)
> plot(lID,lWT)
> points(lID[ix.M],lWT[ix.M],pch="+",col="blue")
> points(lID[ix.F],lWT[ix.F],pch="+",col="red")
>
> you now get what looks like much closer agreement between the
> covariance cov(lID,lWT) then before. Hence, I would now suggest
> that you do your limear discrimination on the logarithms of the
> variables (since you also get agreement for the other pairs on
> the log scale.
>
> In fact:
>
> [Raw]:
> [Male]:
> cov(cbind(WG,WT,ID)[ix.M,])
> # WG WT ID
> # WG 2.2552465 0.11074710 -0.02202080
> # WT 0.1107471 0.33853450 0.06601287
> # ID -0.0220208 0.06601287 0.31979368
>
> [Female]:
> cov(cbind(WG,WT,ID)[ix.F,])
> # WG WT ID
> # WG 2.4716912 0.1577307 0.6670657
> # WT 0.1577307 0.3183928 0.2973335
> # I D 0.6670657 0.2973335 2.8326520
>
> [log]:
> [Male]:
> cov(cbind(lWG,lWT,lID)[ix.M,])
> # lWG lWT lID
> # lWG 0.0006584465 0.0001813315 -0.0002133576
> # lWT 0.0001813315 0.0030368382 0.0030442356
> # lID -0.0002133576 0.0030442356 0.0693965979
>
> [Female]:
> cov(cbind(lWG,lWT,lID)[ix.F,])
> # lWG lWT lID
> # lWG 0.0007244826 0.0002171885 0.001951343
> # lWT 0.0002171885 0.0019640076 0.003305884
> # lID 0.0019513428 0.0033058841 0.068406840
>
>
>> So there is no simple road to applying a routine LDA to your data.
>>
>> To take account of different covariances between the two groups,
>> you would normally be looking at a quadratic discriminator. However,
>> as indicated above, the fact that a linear discriminator using
>> the variables ID & WT alone works so well would leave considerable
>> imprecision in conclusions to be drawn from its results.
>>
>> Sorry this is not the straightforward answer you were hoping for
>> (which I confess I have not sought); it is simply a reaction to
>> what your data say.
>>
>> Ted.
>
> --------------------------------------------------------------------
> E-Mail: (Ted Harding) <Ted.Harding at manchester.ac.uk>
> Fax-to-email: +44 (0)870 094 0861
> Date: 24-May-09 Time: 21:49:50
> ------------------------------ XFMail ------------------------------
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
--
View this message in context: http://www.nabble.com/Animal-Morphology%3A-Deriving-Classification-Equation-with-Linear-Discriminat-Analysis-%28lda%29-tp23693355p23698217.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list