[R] Some questions about R's modelling algebra

Thomas Lumley tlumley at u.washington.edu
Fri Jul 2 18:19:32 CEST 2010

On Fri, 2 Jul 2010, Hadley Wickham wrote:

> Hi all,
> In preparation for teaching a class next week, I've been reviewing R's
> standard modelling algebra. I've used it for a long time and have a
> pretty good intuitive feel for how it works, but would like to
> understand more of the technical details. The best (online) reference
> I've found so far is the section in "An Introduction to R"
> (http://cran.r-project.org/doc/manuals/R-intro.html#Formulae-for-statistical-models).
> Does anyone have any other suggestions?
> I have a few questions about the definitions given in "An Introduction to R":
> * "M_1 : M_2 - The tensor product of M_1 and M_2. If both terms are
> factors, then the “subclasses” factor."
>   From my reading, the usual interpretation of a tensor product when
> x and y are vectors is the outer product.  I don't see how that would
> work here - how does a matrix work as an predictor in a linear model?

Think of it for a single observation.  x and y specify terms that could be scalars or could be row vectors (eg ns(x), poly(y,3)), and the terms 
in x:y are the products of each term from x with each term from y.    Like taking the Kronecker product and then reshaping it back into a row vector.

> In what sense is the tensor product of x with itself equal to x?

This is the messy bit.  The 'product'  operator is not the arithmetic product, because x:x is not the same as x:z even if z=x.

The product of a set of single-column terms is formed by  eliminating any terms from the set that are syntactically duplicates and then taking the arithmetic product of the remaining terms.  This is the Right Thing for producing design matrices, but is a bit of a mess to describe.

So  x:z:log(z) contains no duplicates and produces x*z*log(z).  x:z contains no duplicates and produces x*z (even if z=x), but x:z:x produces x*z and x:x produces x.

>  What is the subclasses factor? Is it interaction(M_1, M_2, sep = "")?


You might find the Wilkinson & Rogers paper more helpful:

   author       = "G. N. Wilkinson and C. E. Rogers",
   title        = "Symbolic description of factorial models for analysis of
   journal      = "Applied Statistics",
   volume       = "22",
   pages        = "392--399",
   year         = "1973",
   comment      = "Reference from MASS",

The notation is slightly different; R uses ':' for their '.' and '^' for their '**'.  I think the algebra is the same.


Thomas Lumley
Professor of Biostatistics
University of Washington, Seattle

More information about the R-help mailing list