[R] lm: how are polynomial functions interpreted?

Tue Jan 13 00:50:06 CET 2009

I find the simplest way to interpret a *linear* model formula, as used by lm() and aov() is to take the left hand side as specifying the response variable (or variables) and to take the right hand side as specifying the *columns of the model matrix* in a coded way.  Notice that the parameters are implicit and do not occur anywhere in the formula.

To take your example, yin ~ I(sin(x)), [which you could simply write as yin ~ sin(x)] would specify yin as the response and the model matrix had two columns namely 1 for the intercept term and sin(x)

	X = [1 sin(x)]

So the model you would be fitting, in a more conventional notation, would be 

	Y = a + b*sin(x) + error.  

lm() and aov() accommodate only *linear* parameters.  You can recoginse a linear parameter by the fact that when you differentiate the right hand side of the model formula with respect to it, the result does not depend on that parameter.

To take your other model

	Y = a + b*sin(d*x + phi) + error

(you left out the error, BTW), clearly a and b are linear parameters but d and phi are not, so you cannot fit this model directly with lm() or aov().  If you knew d and phi, of course, you could fit it since the remaining parameters are all linear and you would specify it using  y ~ sin(d*x+phi)  where d and phi would need to have values at the time of fitting.

The simplest way to fit this kind of model is to use nls().  You can even exploit the fact that a and b are linear parameters by using the "plinear" algorithm, but I'll leave you to sort that one out.  You can also re-write the model so that you have just one non-linear parameter, but again, you can sort that out.

_______

I think the reason why people were perhaps looking a little askance at your this kind of question on R help is that there are plenty of books around where this sort of issue is really done to death.  The introduction to R from the help menu of R is one place where you might start, but there are better ones and now plenty of them.

Bill Venables
http://www.cmis.csiro.au/bill.venables/ 

-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of Carl Witthoft
Sent: Tuesday, 13 January 2009 8:58 AM
To: r-help at r-project.org
Subject: Re: [R] lm: how are polynomial functions interpreted?

Well..... *_* ,

I think it should have been clear that this was not a question for which 
any code exists.  In fact, I gave two very specific examples of function 
calls.  The entire point of my question was not "what's up with my 
(putative) code and data " but rather to try to understand the 
overarching philosophy of the way lm() treats the function it's given.

I do understand the sneaky ways to make it do a linear fit with or 
without forcing the origin.  And, sure, I could have run a data set thru 
a bunch of different quadratic-like functions to try to see what happens.

Let me pick a more complicated example.  The general case of a sin fit 
might be Y = a + b*sin(d*x+phi)  .(where, to be pedantic, x is the only 
data input. All others are coefficients to be found)

If I try  y<-lm(yin~I(sin(x))), what is the actual fit function?  And so on.

That's why I was hoping for a more general explanation of what lm() does.

Charles C. Berry wrote:
> On Mon, 12 Jan 2009, cgw at witthoft.com wrote:
> 
> [nothing deleted]
> 
> matplot(1:100, lm(rnorm(100)~poly(1:100,4),x=T)$x ) # for example
> 
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
> 
> Ahem!
> 
>> and provide commented, minimal, self-contained, reproducible code.
> ......^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> Charles C. Berry                            (858) 534-2098
>                                             Dept of Family/Preventive 
> Medicine
> E mailto:cberry at tajo.ucsd.edu                UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901
> 
> 
>

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.