R-beta: R0.62.3 problems

Prof Brian D Ripley ripley at stats.ox.ac.uk
Wed Sep 2 08:58:19 CEST 1998

[Should this divert to R-devel: seems to me to be more appropriate there?
Even more appropriate if it had been discussed during development, not two
releases later!]

On Wed, 2 Sep 1998, John Maindonald wrote:

> I consider that the S behaviour is misleading and confusing, and should
> not be copied.  I'd been pleased to find that R did use the same form of
> labelling for 3+-level factors as for binary factors.

I suspect you meant `for binary factors as for 3+level factors in S!' This
was an undocumented difference between R and S, and as I believe deliberate
differences should be documented, I suspect this was not one of them. 
(Certainly when I raised it, no one suggested that it was deliberate.)

> For 3+ level factors in S, one can tell from the output whether the
> parameterisation is "helmert" or "contrast".  

How does one do this? If you mean from the printed output from a print or
summary method, I only know how to do this only if those were the only two
possibilities, which they are not. 

Let me point out that in 0.62.3 (but not 0.62.1) you can find out what the
coding used is by looking at the contrasts component of the object, so you
_can_ find the parametrization (as my dictionary spells it) from the output
of lm or glm, whatever the coding (and there is an essentially infinite set
of possibilities). 

> Actually I consider that the output ought to identify what
> parameterisation has been used.  I consider, also, that the S decision
> to make "helmert" the default is unsatisfactory.  While helmert makes
> sense for computation, it is almost never sensible for output.  

The coding _is_ now contained in the R output. I think you are thinking in
SAS-like terms and mean that you want that output printed by print or
print.summary methods (which?). And/or print.coef methods or precisely
what? Given that each factor can have a different coding, this could lead
to a very much larger output. (You did appreciate that arbitrary contrasts
could be attached to each factor?)  Would it not be better (and in the
spirit of S) to have a separate function to print out the codings used, or
to print out a coding-free view of the fit? (Hint: there is such a function
in S.) 

You can very easily set the default coding for your own use, so what is
your concern over the global default?  In balanced designs I would say that
a (block-)orthogonal coding is much less likely to mislead, but I at least
do not wish to impose my views on the rest of the users. 

> Perhaps the issue is that the handling of the computation, which ought
> to be hidden from the user, should be separated from the
> paramterisation of the output.  In fact, in both S and (I expect) in
> R, they are linked.

I agree that they should be separate, and that is the point of a lot of my
recent work on filling holes in R. There is an important point here. lm can
be used in more than one way; print.lm is designed for regression and
print.aov for analysis of variance, with model.tables and dummy.coef to
examine the output in a coding-independent way.  So I suspect the `output'
you are complaining about may be from inappropriate tools, and there is
certainly room for you to contribute new tools expressing printed output
you find illuminating.

Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list