R-beta: R0.62.3 problems
John Maindonald
john.maindonald at anu.edu.au
Wed Sep 2 00:42:51 CEST 1998
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
> From: Jim Lindsey <jlindsey at alpha.luc.ac.be>
>
> > 2. Binary factor variables no longer have the category label stuck on
> > the end of the variable name in output from glm(). This is very
> > misleading for at least two reasons: (i) there is no way to tell if a
> > variable is factor or not just by looking at the output, (ii) in
> > various contexts, the level printed out may be the first or the
> > second, and it is essential to know which.
>
> That is a `very misleading' statement over a small point! I was
> responsible for that change, which was made a while ago (prior to
> 0.62.2) in contrasts() as part of a tidying up for compatibility with
> S. If this is `essential', why has no one found it necessary to change
> S in the decade (to my knowledge) that it has had no such suffix? (By
> `category label stuck on the end of the variable name in output from
> glm()' I assume you meant to write that the level of a factor is
> appended to the factor name in forming contrast names for treatment
> contrasts. Which is not so `misleading'.) Further:
I consider that the S behaviour is misleading and confusing, and should
not be copied. I'd been pleased to find that R did use the same form of
labelling for 3+-level factors as for binary factors.
For 3+ level factors in S, one can tell from the output whether the
parameterisation is "helmert" or "contrast". For binary factors the
labelling is, in my view confusingly, identical. I am sensitive to
this because I sorted this point out for someone last week. (In this
special [binary] case the coefficients and SE's for "helmert" are
smaller by a factor of 2.)
Actually I consider that the output ought to identify what
parameterisation has been used. I consider, also, that the S decision
to make "helmert" the default is unsatisfactory. While helmert makes
sense for computation, it is almost never sensible for output.
Perhaps the issue is that the handling of the computation, which ought
to be hidden from the user, should be separated from the
paramterisation of the output. In fact, in both S and (I expect) in
R, they are linked.
John Maindonald email : john.maindonald at anu.edu.au
Statistical Consulting Unit, phone : (6249)3998
c/o CMA, SMS, fax : (6249)5549
John Dedman Mathematical Sciences Building
Australian National University
Canberra ACT 0200
Australia
