[R] summary.manova rank deficiency error + data
Peter Dalgaard
p.dalgaard at biostat.ku.dk
Wed Aug 13 18:04:41 CEST 2008
Pedro Mardones wrote:
> Thanks for the reply. The SAS output is attached but seems to me that
> doesn't correspond to the wihtin-row contrasts as you suggested. By
> the way, yes the data are highly correlated, in fact each row
> correspond to the first part of a signal vector. Thanks anyway....
> PM
>
Agreed. I tried disabling the check that causes R to protest, and then
it gives similar DF but not quite the same statistics, quite possibly
due to numerical instabilities in one or both systems. (You can easily
try yourself, just do anova.mlm <- stats::anova.mlm and edit the qr()
call inside.)
> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
Analysis of Variance Table
Df Wilks approx F num Df den Df Pr(>F)
(Intercept) 1 0.002537 1887.24 5 24 <2e-16 ***
GROUP 2 0.62 1.29 10 48 0.2616
Residuals 28
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
>
> The GLM Procedure
> Multivariate Analysis of Variance
> E = Error SSCP Matrix
> y1 y2 y3
> y4 y5
> y1 0.0353518799 0.035256904 0.0351327804
> 0.0349749601 0.0347868018
> y2 0.035256904 0.0351627227 0.0350395053
> 0.0348827098 0.0346956744
> y3 0.0351327804 0.0350395053 0.0349173343
> 0.0347617352 0.0345760232
> y4 0.0349749601 0.0348827098 0.0347617352
> 0.0346075203 0.0344233531
> y5 0.0347868018 0.0346956744 0.0345760232
> 0.0344233531 0.0342409225
>
> Partial Correlation Coefficients from the Error SSCP Matrix / Prob > |r|
> DF = 28 y1 y2 y3
> y4 y5
> y1 1.000000 0.999992 0.999967
> 0.999921 0.999852
> <.0001 <.0001
> <.0001 <.0001
> y2 0.999992 1.000000 0.999991
> 0.999963 0.999911
> <.0001 <.0001
> <.0001 <.0001
> y3 0.999967 0.999991 1.000000
> 0.999990 0.999958
> <.0001 <.0001
> <.0001 <.0001
> y4 0.999921 0.999963 0.999990
> 1.000000 0.999989
> <.0001 <.0001 <.0001
> <.0001
> y5 0.999852 0.999911 0.999958
> 0.999989 1.000000
> <.0001 <.0001 <.0001 <.0001
>
> The SAS System 10:33 Wednesday, August 13, 2008 8
> The GLM Procedure
> Multivariate Analysis of Variance
> H = Type III SSCP Matrix for group
> y1 y2 y3
> y4 y5
> y1 0.0023822408 0.002365848 0.0023471328
> 0.0023261249 0.0023030993
> y2 0.002365848 0.0023495679 0.0023309816
> 0.0023101183 0.0022872511
> y3 0.0023471328 0.0023309816 0.0023125426
> 0.0022918453 0.0022691608
> y4 0.0023261249 0.0023101183 0.0022918453
> 0.0022713359 0.0022488593
> y5 0.0023030993 0.0022872511 0.0022691608
> 0.0022488593 0.0022266141
>
> Characteristic Roots and Vectors of: E Inverse * H, where
> H = Type III SSCP Matrix for group
> E = Error SSCP Matrix
> Characteristic Characteristic Vector V'EV=1
> Root Percent y1 y2 y3
> y4 y5
> 0.41840103 71.72 -7542.628 17131.814 5347.394
> -31627.317 16700.100
> 0.16496011 28.28 -4180.854 -4413.446 32096.035
> -35545.204 12040.697
> 0.00000001 0.00 -41004.875 107291.004 -95905.664
> 32641.189 -3028.470
> 0.00000000 0.00 -416.226 -111.206 410.721
> 295.193 -171.953
> 0.00000000 0.00 -14678.651 5787.997 54718.250
> -69055.249 23218.580
>
> MANOVA Test Criteria and F Approximations for the Hypothesis of No
> Overall group Effect
> H = Type III SSCP Matrix for group
> E = Error SSCP Matrix
> S=2 M=1 N=11
> Statistic Value F Value Num DF
> Den DF Pr > F
> Wilks' Lambda 0.60518744 1.37 10
> 48 0.2227
> Pillai's Trace 0.43658228 1.40 10
> 50 0.2095
> Hotelling-Lawley Trace 0.58336114 1.37 10
> 33.362 0.2385
> Roy's Greatest Root 0.41840103 2.09 5
> 25 0.1000
>
>
>
>
>
>
>
>
>
>
> On Wed, Aug 13, 2008 at 4:34 AM, Peter Dalgaard
> <p.dalgaard at biostat.ku.dk> wrote:
>
>> Pedro Mardones wrote:
>>
>>> Dear R-users;
>>>
>>> Previously I posted a question about the problem of rank deficiency in
>>> summary.manova. As somebody suggested, I'm attaching a small part of
>>> the data set.
>>>
>>> #***************************************************
>>>
>>> "test" <-
>>>
>>> structure(.Data = list(structure(.Data = c(rep(1,3),rep(2,18),rep(3,10)),
>>> levels = c("1", "2", "3"),
>>> class = "factor")
>>>
>>>
>>> ,c(0.181829,0.090159,0.115824,0.112804,0.134650,0.249136,0.163144,0.122012,0.157554,0.126283,
>>>
>>> 0.105344,0.125125,0.126232,0.084317,0.092836,0.108546,0.159165,0.121620,0.142326,0.122770,
>>>
>>> 0.117480,0.153762,0.156551,0.185058,0.161651,0.182331,0.139531,0.188101,0.103196,0.116877,0.113733)
>>>
>>>
>>> ,c(0.181445,0.090254,0.115840,0.112863,0.134610,0.249003,0.163116,0.122135,0.157206,0.126129,
>>>
>>> 0.105302,0.124917,0.126243,0.084455,0.092818,0.108458,0.158769,0.121244,0.141981,0.122595,
>>>
>>> 0.117556,0.153507,0.156308,0.184644,0.161421,0.181999,0.139376,0.187708,0.103126,0.116615,0.113746)
>>>
>>>
>>> ,c(0.181058,0.090426,0.115926,0.113022,0.134632,0.248845,0.163140,0.122331,0.156871,0.126023,
>>>
>>> 0.105335,0.124757,0.126325,0.084690,0.092885,0.108455,0.158386,0.120913,0.141676,0.122492,
>>>
>>> 0.117707,0.153293,0.156095,0.184242,0.161214,0.181670,0.139271,0.187318,0.103129,0.116421,0.113826)
>>>
>>>
>>> ,c(0.180692,0.090704,0.116110,0.113319,0.134745,0.248678,0.163256,0.122637,0.156581,0.125998,
>>>
>>> 0.105479,0.124686,0.126514,0.085066,0.093088,0.108587,0.158040,0.120674,0.141446,0.122488,
>>>
>>> 0.117972,0.153150,0.155954,0.183885,0.161063,0.181383,0.139251,0.186956,0.103232,0.116351,0.114001)
>>>
>>>
>>> ,c(0.180353,0.091088,0.116392,0.113753,0.134965,0.248520,0.163475,0.123046,0.156354,0.126067,
>>>
>>> 0.105726,0.124713,0.126821,0.085584,0.093432,0.108858,0.157742,0.120533,0.141309,0.122595,
>>>
>>> 0.118340,0.153088,0.155897,0.183582,0.160975,0.181143,0.139314,0.186636,0.103449,0.116415,0.114275)
>>> )
>>> ,names = c("GROUP", "Y1", "Y2", "Y3", "Y4","Y5")
>>> ,row.names = seq(1:31)
>>> ,class = "data.frame"
>>> )
>>>
>>> summary(manova(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), test = "Wilks")
>>>
>>> #Error in summary.manova(manova(cbind(Y1, Y2, Y3, Y4, Y5) ~ GROUP, test),
>>> :
>>> residuals have rank 3 < 5
>>>
>>> #***************************************************
>>>
>>> What I don't understand is why SAS returns no errors using PROC GLM
>>> for the same data set. Is because PROC GLM doesn't take into account
>>> problems of rank deficiency? So, should I trust manova instead of PROC
>>> GLM output? I know it can be a touchy question but I would like to
>>> receive some insights.
>>> Thanks
>>> PM
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> What you have here is extremely correlated data:
>>
>>
>>> (V <- estVar(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test)))
>>>
>> Y1 Y2 Y3 Y4 Y5
>> Y1 0.001262567 0.001259177 0.001254746 0.001249106 0.001242385
>> Y2 0.001259177 0.001255814 0.001251416 0.001245812 0.001239132
>> Y3 0.001254746 0.001251416 0.001247055 0.001241494 0.001234861
>> Y4 0.001249106 0.001245812 0.001241494 0.001235983 0.001229405
>> Y5 0.001242385 0.001239132 0.001234861 0.001229405 0.001222889
>>
>>> eigen(V)
>>>
>> $values
>> [1] 6.224077e-03 2.313066e-07 3.499837e-10 4.259125e-12 1.334146e-12
>>
>> $vectors
>> [,1] [,2] [,3] [,4] [,5]
>> [1,] 0.4503756 0.61213579 0.5204920 -0.3485941 0.1732681
>> [2,] 0.4491807 0.32333236 -0.1873653 0.5929444 -0.5540795
>> [3,] 0.4476157 0.01442094 -0.5498688 0.1272921 0.6934503
>> [4,] 0.4456201 -0.31202109 -0.3198606 -0.6557557 -0.4144143
>> [5,] 0.4432397 -0.65052351 0.5378809 0.2840428 0.1017918
>>
>> Notice the more than 9 orders of magnitude between the eigenvalues.
>>
>> I think that what is happening is that what SAS calls MANOVA is actually
>> looking at within-row contrasts, which effectively removes the largest
>> eigenvalue. In R, the equivalent would be
>>
>>
>>> anova(lm(cbind(Y1,Y2,Y3,Y4,Y5)~GROUP, test), X=~1, test = "Wilks")
>>>
>> Analysis of Variance Table
>>
>>
>> Contrasts orthogonal to
>> ~1
>>
>> Df Wilks approx F num Df den Df Pr(>F)
>> (Intercept) 1 0.037 164.873 4 25 <2e-16 ***
>> GROUP 2 0.701 1.215 8 50 0.3098
>> Residuals 28
>> ---
>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>>
>> or (this could be computationally more precice, but in fact it gives the
>> same result)
>>
>>
>>> anova(lm(cbind(Y2,Y3,Y4,Y5)-Y1~GROUP, test), test = "Wilks")
>>>
>> --
>> O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
>> c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
>> (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
>> ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
>>
>>
>>
--
O__ ---- Peter Dalgaard Øster Farimagsgade 5, Entr.B
c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
(*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918
~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
More information about the R-help
mailing list