[R] using poly in a linear regression in the presence of NA fails (despite subsetting them out)
Markus Jäntti
markus.jantti at iki.fi
Mon Feb 14 13:52:14 CET 2005
I ran into a to me surprising result on running lm with an orthogonal
polynomial among the predictors.
The lm command resulted in
Error in qr(X) : NA/NaN/Inf in foreign function call (arg 1)
Error during wrapup:
despite my using a "subset" in the call to get rid of NA's.
poly is apparently evaluated before any NA's are subsetted out
of the data.
Example code (attached to this e-mail as as a script):
> ## generate some data
> n <- 50
> x <- runif(n)
> a0 <- 10
> a1 <- .5
> sigma.e <- 1
> y <- a0 + a1*x + rnorm(n)*sigma.e
> tmp.d <- data.frame(y, x)
> rm(list=c("n", "x", "a0", "a1", "sigma.e", "y"))
>
> print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d)
+
+ ## now make a few NA's
+
+ tmp.d$x[1:2] <- rep(NA, 2)
Error: syntax error
Error during wrapup:
>
> ## this fails, just as it should
> print(lm.1 <- lm(y ~ poly(x, 2), data = tmp.d))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d)
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
>
> ## these also fail, but should not?
>
> print(lm.2 <- lm(y ~ poly(x, 2), data = tmp.d, subset = !is.na(x)))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, subset = !is.na(x))
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
> print(lm.3 <- lm(y ~ poly(x, 2), data = tmp.d, na.action = na.omit))
Call:
lm(formula = y ~ poly(x, 2), data = tmp.d, na.action = na.omit)
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
>
> ## but this works
>
> print(lm.3 <- lm(y ~ poly(x, 2), data = subset(tmp.d, subset =
!is.na(x))))
Call:
lm(formula = y ~ poly(x, 2), data = subset(tmp.d, subset = !is.na(x)))
Coefficients:
(Intercept) poly(x, 2)1 poly(x, 2)2
10.380 -0.242 -1.441
--------------------
The documentation of lm is *not* misleading at this point, saying that
subset an optional vector specifying a subset of observations to be
used in the fitting process.
which implies that data are subsetted once lm.fit is called.
All the same, this behavior is a little unexpected to me.
Is it to be considered a feature, that is, does it produce beneficial
side effects which explain why it works as it does?
Regards,
Markus
I am running R on a Debian testing system with kernel 2.6.10 and
> version
_
platform i386-pc-linux-gnu
arch i386
os linux-gnu
system i386, linux-gnu
status
major 2
minor 0.1
year 2004
month 11
day 15
language R
--
Markus Jantti
Abo Akademi University
markus.jantti at iki.fi
http://www.iki.fi/~mjantti
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: poly-example.R
Url: https://stat.ethz.ch/pipermail/r-help/attachments/20050214/68cdf3f1/poly-example.pl
More information about the R-help
mailing list