[R] summary.manova rank deficiency error + data
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Thu Aug 14 09:32:56 CEST 2008
Peter Dalgaard wrote:
> Pedro Mardones wrote:
>> Thanks for the reply. The SAS output is attached but seems to me that
>> doesn't correspond to the wihtin-row contrasts as you suggested. By
>> the way, yes the data are highly correlated, in fact each row
>> correspond to the first part of a signal vector. Thanks anyway....
>> PM
>>
> Agreed. I tried disabling the check that causes R to protest, and then
> it gives similar DF but not quite the same statistics, quite possibly
> due to numerical instabilities in one or both systems. (You can easily
> try yourself, just do anova.mlm <- stats::anova.mlm and edit the qr()
> call inside.)
>
> > anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
> Analysis of Variance Table
>
> Df Wilks approx F num Df den Df Pr(>F) (Intercept) 1
> 0.002537 1887.24 5 24 <2e-16 ***
> GROUP 2 0.62 1.29 10 48 0.2616 Residuals
> 28 ---
> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
Actually, a better strategy is to transform the responses so as to
reduce the correlation. This works without messing with the tolerances
> anova(lm(cbind(Y1,cbind(Y2,Y3,Y4,Y5)-Y1)~GROUP, test), test = "Wilks")
Analysis of Variance Table
Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.002537 1887.25 5 24 <2e-16 ***
GROUP 2 0.62 1.29 10 48 0.2616
Residuals 28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
You might want to try the same thing in SAS. (Also, check that the data
are actually the same. With this level of correlation, last-digit
differences can be influential.)
>
>>
>> The GLM Procedure
>> Multivariate Analysis of Variance
>> E = Error SSCP Matrix
>> y1 y2 y3
>> y4 y5
>> y1 0.0353518799 0.035256904 0.0351327804
>> 0.0349749601 0.0347868018
>> y2 0.035256904 0.0351627227 0.0350395053
>> 0.0348827098 0.0346956744
>> y3 0.0351327804 0.0350395053 0.0349173343
>> 0.0347617352 0.0345760232
>> y4 0.0349749601 0.0348827098 0.0347617352
>> 0.0346075203 0.0344233531
>> y5 0.0347868018 0.0346956744 0.0345760232
>> 0.0344233531 0.0342409225
>>
>> Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
>> DF = 28 y1 y2 y3
>> y4 y5
>> y1 1.000000 0.999992 0.999967
>> 0.999921 0.999852
>> <.0001 <.0001
>> <.0001 <.0001
>> y2 0.999992 1.000000 0.999991
>> 0.999963 0.999911
>> <.0001 <.0001
>> <.0001 <.0001
>> y3 0.999967 0.999991 1.000000
>> 0.999990 0.999958
>> <.0001 <.0001
>> <.0001 <.0001
>> y4 0.999921 0.999963 0.999990
>> 1.000000 0.999989
>> <.0001 <.0001 <.0001
>> <.0001
>> y5 0.999852 0.999911 0.999958
>> 0.999989 1.000000
>> <.0001 <.0001 <.0001
>> <.0001
>>
>> The SAS System 10:33 Wednesday, August 13, 2008 8
>> The GLM Procedure
>> Multivariate Analysis of Variance
>> H = Type III SSCP Matrix for group
>> y1 y2 y3
>> y4 y5
>> y1 0.0023822408 0.002365848 0.0023471328
>> 0.0023261249 0.0023030993
>> y2 0.002365848 0.0023495679 0.0023309816
>> 0.0023101183 0.0022872511
>> y3 0.0023471328 0.0023309816 0.0023125426
>> 0.0022918453 0.0022691608
>> y4 0.0023261249 0.0023101183 0.0022918453
>> 0.0022713359 0.0022488593
>> y5 0.0023030993 0.0022872511 0.0022691608
>> 0.0022488593 0.0022266141
>>
>> Characteristic Roots and Vectors of: E Inverse * H,
>> where
>> H = Type III SSCP Matrix for group
>> E = Error SSCP Matrix
>> Characteristic Characteristic Vector V'EV=1
>> Root Percent y1 y2 y3
>> y4 y5
>> 0.41840103 71.72 -7542.628 17131.814 5347.394
>> -31627.317 16700.100
>> 0.16496011 28.28 -4180.854 -4413.446 32096.035
>> -35545.204 12040.697
>> 0.00000001 0.00 -41004.875 107291.004 -95905.664
>> 32641.189 -3028.470
>> 0.00000000 0.00 -416.226 -111.206 410.721
>> 295.193 -171.953
>> 0.00000000 0.00 -14678.651 5787.997 54718.250
>> -69055.249 23218.580
>>
>> MANOVA Test Criteria and F Approximations for the Hypothesis of No
>> Overall group Effect
>> H = Type III SSCP Matrix for group
>> E = Error SSCP Matrix
>> S=2 M=1 N=11
>> Statistic Value F Value Num DF
>> Den DF Pr > F
>> Wilks' Lambda 0.60518744 1.37 10
>> 48 0.2227
>> Pillai's Trace 0.43658228 1.40 10
>> 50 0.2095
>> Hotelling-Lawley Trace 0.58336114 1.37 10
>> 33.362 0.2385
>> Roy's Greatest Root 0.41840103 2.09 5
>> 25 0.1000
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
>> <p.dalgaard at biostat.ku.dk> wrote:
>>
>>> Pedro Mardones wrote:
>>>
>>>> Dear R-users;
>>>>
>>>> Previously I posted a question about the problem of rank deficiency in
>>>> summary.manova. As somebody suggested, I'm attaching a small part of
>>>> the data set.
>>>>
>>>> #***************************************************
>>>>
>>>> "test" <-
>>>>
>>>> structure(.Data = list(structure(.Data =
>>>> c(rep(1,3),rep(2,18),rep(3,10)),
>>>> levels = c("1", "2", "3"),
>>>> class = "factor")
>>>>
>>>>
>>>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>>>
>>>>
>>>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>>>
>>>>
>>>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>>>
>>>>
>>>>
>>>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>>>
>>>>
>>>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>>>
>>>>
>>>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>>>
>>>>
>>>>
>>>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>>>
>>>>
>>>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>>>
>>>>
>>>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>>>
>>>>
>>>>
>>>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>>>
>>>>
>>>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>>>
>>>>
>>>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>>>
>>>>
>>>>
>>>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>>>
>>>>
>>>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>>>
>>>>
>>>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>>>>
>>>> )
>>>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>>>> ,row.names = seq(1:31)
>>>> ,class = "data.frame"
>>>> )
>>>>
>>>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>>>
>>>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP,
>>>> test),
>>>> :
>>>> residuals have rank 3 < 5
>>>>
>>>> #***************************************************
>>>>
>>>> What I don't understand is why SAS returns no errors using PROC GLM
>>>> for the same data set. Is because PROC GLM doesn't take into account
>>>> problems of rank deficiency? So, should I trust manova instead of PROC
>>>> GLM output? I know it can be a touchy question but I would like to
>>>> receive some insights.
>>>> Thanks
>>>> PM
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>> What you have here is extremely correlated data:
>>>
>>>
>>>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
>>>>
>>> Y1 Y2 Y3 Y4 Y5
>>> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
>>> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
>>> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
>>> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
>>> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>>>
>>>> eigen(V)
>>>>
>>> $values
>>> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>>>
>>> $vectors
>>> [,1] [,2] [,3] [,4] [,5]
>>> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
>>> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
>>> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
>>> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
>>> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>>>
>>> Notice the more than 9 orders of magnitude between the eigenvalues.
>>>
>>> I think that what is happening is that what SAS calls MANOVA is
>>> actually
>>> looking at within-row contrasts, which effectively removes the largest
>>> eigenvalue. In R, the equivalent would be
>>>
>>>
>>>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
>>>>
>>> Analysis of Variance Table
>>>
>>>
>>> Contrasts orthogonal to
>>> ~1
>>>
>>> Df Wilks approx F num Df den Df Pr(>F)
>>> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
>>> GROUP 2 0.701 1.215 8 50 0.3098
>>> Residuals 28
>>> ---
>>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>>
>>> or (this could be computationally more precice, but in fact it gives
>>> the
>>> same result)
>>>
>>>
>>>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>>>>
>>> --
>>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
>>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45)
>>> 35327907
>>>
>>>
>>>
>
>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list