[R] Writing functions

Thomas Lumley tlumley at u.washington.edu
Mon Dec 3 20:28:00 CET 2001

On Mon, 3 Dec 2001, [iso-8859-1] Göran Broström wrote:

> I want to rewrite my function(s) so that they get the elegance of
> lm, coxph, etc, with formulas. Where can I find a document that
> describes, in one place, how to do it? I am now reading the code
> of 'coxph' and 'lm', which gives me lots of functions to look up,
> like match.call, match.arg, terms, eval, and so on, but I don't get
> the overview I need from that.

Some of us have found that it's best to start using formulas without
waiting to understand how everything works.  For most purposes there is a
standard set of incantations to be uttered that give everything you need.
It's easier to understand after you have it working and can step through
to see what each piece does.  There's some explanation in
`S Programming' and in `Statistical Models in S'.

The hard work is done by model.frame() and model.matrix(), and the main
problem is that model.frame() really needs to be run in the calling
environment rather than inside your function (since that's where all the
variables are).

Let's start with


You want to construct a model frame with all the variables needed for the
formula, and probably a design matrix.

    mf<-match.call() # get a copy of the call
    mf[[1]]<-as.name("model.frame") #turn into a call to model.frame
    mf$someotheroption<-NULL #remove options that don't go to model.frame
    mf<-eval(mf,parent.frame()) # run model.frame

Now you have a model frame you can make a model matrix.

    mt<-terms(formula, data=data)
    mm<-model.matrix(mt, mf)

The explicit terms() call is only necessary to handle the notation '.' in
a formula, meaning `all the other variables'; otherwise you could just do
    mm<-model.matrix(formula, mf)

* You can define extra terms for your formulas (like strata() and
cluster() in coxph, Error() in aov, offset() in glm) but that is more complicated.
It usually requires a two-step process where you extract those special
terms and then rewrite the formula and re-run model.frame ()  [for
offset(), all this is handled internally by model.frame()]. Read those
functions to see ways of doing it.

* An alternative to evaluating in parent.frame() is to evaluate in
environment(formula), the place where the formula was defined.  This will
usually be the same and when it isn't will often be better. However, it
isn't compatible with S and might not be compatible with other functions.


r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list