[R] means over factors in mlm terms

Tue Nov 21 17:46:41 CET 2006

Dear Mike,

I don't have time this morning to work this out in detail, but perhaps the
following will help:

model.response(model.frame(soils.mod)) will give you the response-variable
matrix for the model. variable.names(model.frame(soils.mod)) will give you
the names of the variables in the model, with the lhs first. If you need
more information about the structure of the model, you'll find that in
terms(soils.mod) (but I think that you've already discovered that).

I hope this helps,
 John

--------------------------------
John Fox
Department of Sociology
McMaster University
Hamilton, Ontario
Canada L8S 4M4
905-525-9140x23604
http://socserv.mcmaster.ca/jfox 
-------------------------------- 

> -----Original Message-----
> From: r-help-bounces at stat.math.ethz.ch 
> [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of 
> Michael Friendly
> Sent: Tuesday, November 21, 2006 11:21 AM
> To: R-Help
> Subject: [R] means over factors in mlm terms
> 
> I'm trying to write a function to find the means over factors 
> of the responses in a mlm (something I would do easily in SAS 
> with PROC SUMMARY).
> 
> The not-working stub of a function to do what I want is 
> below, and my problem is that I don't know how to call 
> aggregate (or some other function) in the context of terms in 
> a linear model extracted from a lm/mlm object.
> 
> means.mlm <- function(mod, terms, data) {
>    responses <-dimnames(mod$fitted.values)[[2]]
>    if (missing(terms)) terms <- mod$terms
>    n.terms <- length(terms)
> 
>    for (term in 1:n.terms){
>    	label <- attr(terms[term], "term.labels")
>    	cat(label,":\n")
>    	means <- aggregate(data, list(label), FUN=mean)
>    	print(means)
>    }
> }
> 
> Here is a sample context-- a randomized block design with two 
> crossed factors (Contour, Depth) and 9 responses (pH, N, ... 
> Conduc).  [An R-readable version of the data is appended at 
> the bottom.]
> 
>  >  soils <- read.delim("soils.dat")
>  >  str(soils)
> 'data.frame':   48 obs. of  14 variables:
>   $ Group  : int  1 1 1 1 2 2 2 2 3 3 ...
>   $ Contour: Factor w/ 3 levels "Depression","Slope",..: 3 3 
> 3 3 3 3 3 3
> 3 3 ...
>   $ Depth  : Factor w/ 4 levels "0-10","10-30",..: 1 1 1 1 2 
> 2 2 2 3 3 ...
>   $ Gp     : Factor w/ 12 levels "D0","D1","D3",..: 9 9 9 9 
> 10 10 10 10 
> 11 11 ...
>   $ Block  : int  1 2 3 4 1 2 3 4 1 2 ...
>   $ pH     : num  5.4 5.65 5.14 5.14 5.14 5.1 4.7 4.46 4.37 4.39 ...
>   $ N      : num  0.188 0.165 0.26 0.169 0.164 0.094 0.1 0.112 0.112 
> 0.058 ...
>   $ Dens   : num  0.92 1.04 0.95 1.1 1.12 1.22 1.52 1.47 1.07 1.54 ...
>   $ P      : int  215 208 300 248 174 129 117 170 121 115 ...
>   $ Ca     : num  16.4 12.3 13.0 11.9 14.2 ...
>   $ Mg     : num  7.65 5.15 5.68 7.88 8.12 ...
>   $ K      : num  0.72 0.71 0.68 1.09 0.7 0.81 0.39 0.7 0.74 0.77 ...
>   $ Na     : num  1.14 0.94 0.6 1.01 2.17 2.67 3.32 3.76 5.74 5.85 ...
>   $ Conduc : num  1.09 1.35 1.41 1.64 1.85 3.18 4.16 5.14 
> 5.73 6.45 ...
>  >
> I fit a mlm model,
> 
>  > attach(soils)
>  > soils.mod <- lm(cbind(pH,N,Dens,P,Ca,Mg,K,Na,Conduc) ~ Block +
> Contour*Depth)
> 
> and then I want to get tables of the means over the factors 
> in the model terms.  From the console, I can get the means 
> for a factor or model term (but with annoying warning messages
> 
>  > (m1<-aggregate(soils, list(Contour), mean))
>       Group.1 Group Contour Depth Gp Block       pH         N 
>     Dens 
>        P       Ca      Mg
> 1 Depression  10.5      NA    NA NA   2.5 4.691875 0.0873125 1.343125 
> 188.1875 7.101250 8.98625
> 2      Slope   6.5      NA    NA NA   2.5 4.746250 0.1059375 1.332500 
> 159.4375 8.109375 8.32000
> 3        Top   2.5      NA    NA NA   2.5 4.570000 0.1125625 1.271875 
> 150.8750 8.877500 8.08750
>           K       Na   Conduc
> 1 0.378125 5.823125 6.945625
> 2 0.415625 6.103125 6.963750
> 3 0.605000 4.872500 5.856250
> Warning messages:
> 1: argument is not numeric or logical: returning NA in: 
> mean.default(X[[1]], ...)
> 2: argument is not numeric or logical: returning NA in: 
> mean.default(X[[2]], ...)
>     ... (suppressed) ...
> 9: argument is not numeric or logical: returning NA in: 
> mean.default(X[[3]], ...)
>  >
> 
> When I call my function, I get:
> 
>  > means.mlm(soils.mod, data=soils)
> Block :
> Error in FUN(X[[1]], ...) : arguments must have same length  
> > traceback()
> 6: stop("arguments must have same length")
> 5: FUN(X[[1]], ...)
> 4: lapply(x, tapply, by, FUN, ..., simplify = FALSE)
> 3: aggregate.data.frame(data, list(label), FUN = mean)
> 2: aggregate(data, list(label), FUN = mean)
> 1: means.mlm(soils.mod, data = soils)
>  >
> 
> The soils data:
> 
> soils <-
> structure(list(Group = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 
> 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 
> 9, 9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12), 
> Contour = structure(c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 
> 3, 3, 3, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 
> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Label = 
> c("Depression", "Slope", "Top"), class = "factor"),
>      Depth = structure(c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4,
>      4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4,
>      1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), .Label 
> = c("0-10",
>      "10-30", "30-60", "60-90"), class = "factor"), Gp = 
> structure(c(9,
>      9, 9, 9, 10, 10, 10, 10, 11, 11, 11, 11, 12, 12, 12, 12,
>      5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 1, 1, 1,
>      1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4), .Label = c("D0",
>      "D1", "D3", "D6", "S0", "S1", "S3", "S6", "T0", "T1", "T3",
>      "T6"), class = "factor"), Block = c(1, 2, 3, 4, 1, 2, 3,
>      4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2,
>      3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1,
>      2, 3, 4), pH = c(5.4, 5.65, 5.14, 5.14, 5.14, 5.1, 4.7, 4.46,
>      4.37, 4.39, 4.17, 3.89, 3.88, 4.07, 3.88, 3.74, 5.11, 5.46,
>      5.61, 5.85, 4.57, 5.11, 4.78, 6.67, 3.96, 4, 4.12, 4.99,
>      3.8, 3.96, 3.93, 4.02, 5.24, 5.2, 5.3, 5.67, 4.46, 4.91,
>      4.79, 5.36, 3.94, 4.52, 4.35, 4.64, 3.82, 4.24, 4.22, 4.41
>      ), N = c(0.188, 0.165, 0.26, 0.169, 0.164, 0.094, 0.1, 0.112,
>      0.112, 0.058, 0.078, 0.07, 0.077, 0.046, 0.055, 0.053, 0.247,
>      0.298, 0.145, 0.186, 0.102, 0.097, 0.122, 0.083, 0.059, 0.05,
>      0.086, 0.048, 0.049, 0.036, 0.048, 0.039, 0.194, 0.256, 0.136,
>      0.127, 0.087, 0.092, 0.047, 0.095, 0.054, 0.051, 0.032, 0.065,
>      0.038, 0.035, 0.03, 0.058), Dens = c(0.92, 1.04, 0.95, 1.1,
>      1.12, 1.22, 1.52, 1.47, 1.07, 1.54, 1.26, 1.42, 1.25, 1.54,
>      1.53, 1.4, 0.94, 0.96, 1.1, 1.2, 1.37, 1.3, 1.3, 1.42, 1.53,
>      1.5, 1.55, 1.46, 1.48, 1.28, 1.42, 1.51, 1, 0.78, 1, 1.13,
>      1.24, 1.47, 1.46, 1.26, 1.6, 1.53, 1.55, 1.46, 1.4, 1.47,
>      1.56, 1.58), P = c(215, 208, 300, 248, 174, 129, 117, 170,
>      121, 115, 112, 117, 127, 91, 91, 79, 261, 300, 242, 229,
>      156, 139, 214, 132, 98, 115, 148, 97, 108, 103, 109, 100,
>      445, 380, 259, 248, 276, 158, 121, 195, 148, 115, 82, 152,
>      105, 100, 97, 130), Ca = c(16.35, 12.25, 13.02, 11.92, 14.17,
>      8.55, 8.74, 9.49, 8.85, 4.73, 6.29, 6.61, 6.41, 3.82, 4.98,
>      5.86, 13.25, 12.3, 9.66, 13.78, 8.58, 8.58, 8.22, 12.68,
>      4.8, 5.06, 6.16, 7.49, 3.82, 4.78, 4.93, 5.66, 12.27, 11.39,
>      9.96, 9.12, 7.24, 7.37, 6.99, 8.59, 4.85, 6.34, 5.99, 4.43,
>      4.65, 4.56, 5.29, 4.58), Mg = c(7.65, 5.15, 5.68, 7.88, 8.12,
>      6.92, 8.16, 9.16, 10.35, 6.91, 7.95, 9.76, 10.96, 6.61, 8,
>      10.14, 7.55, 7.5, 6.76, 7.12, 9.92, 8.69, 7.75, 9.56, 10,
>      8.91, 7.58, 9.38, 8.8, 7.29, 7.47, 8.84, 6.27, 7.55, 8.08,
>      7.04, 9.4, 10.57, 9.91, 8.66, 9.62, 9.78, 9.73, 10.54, 9.85,
>      8.95, 8.37, 9.46), K = c(0.72, 0.71, 0.68, 1.09, 0.7, 0.81,
>      0.39, 0.7, 0.74, 0.77, 0.26, 0.41, 0.56, 0.5, 0.23, 0.41,
>      0.61, 0.68, 0.63, 0.62, 0.63, 0.42, 0.32, 0.55, 0.36, 0.28,
>      0.16, 0.4, 0.24, 0.24, 0.14, 0.37, 0.72, 0.78, 0.45, 0.55,
>      0.43, 0.59, 0.3, 0.48, 0.18, 0.34, 0.22, 0.22, 0.18, 0.33,
>      0.14, 0.14), Na = c(1.14, 0.94, 0.6, 1.01, 2.17, 2.67, 3.32,
>      3.76, 5.74, 5.85, 5.3, 8.3, 9.67, 7.67, 8.78, 11.04, 1.86,
>      2, 1.01, 3.09, 3.67, 4.7, 3.07, 8.3, 6.52, 7.91, 6.39, 9.7,
>      9.57, 9.67, 9.65, 10.54, 1.02, 1.63, 1.97, 1.43, 4.17, 5.07,
>      5.15, 4.17, 7.2, 8.52, 7.02, 7.61, 10.15, 10.51, 8.27, 9.28
>      ), Conduc = c(1.09, 1.35, 1.41, 1.64, 1.85, 3.18, 4.16, 5.14,
>      5.73, 6.45, 8.37, 9.21, 10.64, 10.07, 11.26, 12.15, 2.61,
>      1.98, 0.76, 2.85, 3.24, 4.63, 3.67, 8.1, 7.72, 9.78, 9.07,
>      9.13, 11.57, 11.42, 13.32, 11.57, 0.75, 2.2, 2.27, 0.67,
>      5.08, 6.37, 6.82, 3.65, 10.14, 9.74, 8.6, 9.09, 12.26, 11.29,
>      9.51, 12.69)), .Names = c("Group", "Contour", "Depth", 
> "Gp", "Block", "pH", "N", "Dens", "P", "Ca", "Mg", "K", "Na", "Conduc"
> ), class = "data.frame", row.names = c("1", "2", "3", "4", 
> "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
> "16", "17", "18", "19", "20", "21", "22", "23", "24", "25", 
> "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
> "36", "37", "38", "39", "40", "41", "42", "43", "44", "45", 
> "46", "47", "48"))
> 
> -- 
> Michael Friendly     Email: friendly AT yorku DOT ca
> Professor, Psychology Dept.
> York University      Voice: 416 736-5115 x66249 Fax: 416 736-5814
> 4700 Keele Street    http://www.math.yorku.ca/SCS/friendly.html
> Toronto, ONT  M3J 1P3 CANADA
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.