[R] lm: mark sample used in estimation
Anirban Mukherjee
anirban.mukherjee at gmail.com
Mon Jul 11 09:55:44 CEST 2011
Hi all,
I wanted to mark the estimation sample: mark what rows (observations)
are deleted by lm due to missingness. For eg, from the original
example in help, I have changed one of the values in trt to be NA
(missing).
# code below
# ----
# original example
> ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
# change 18th observation of trt
> trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,NA,4.32,4.69)
> group <- gl(2,10,20, labels=c("Ctl","Trt"))
> weight <- c(ctl, trt)
> lm.D9 <- lm(weight ~ group)
> summary(lm.D9)
Call:
lm(formula = weight ~ group)
Residuals:
Min 1Q Median 3Q Max
-1.04556 -0.48378 0.05444 0.23622 1.39444
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.0320 0.2258 22.281 5.09e-14 ***
groupTrt -0.3964 0.3281 -1.208 0.244
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7142 on 17 degrees of freedom
(1 observation deleted due to missingness)
Multiple R-squared: 0.07907, Adjusted R-squared: 0.0249
F-statistic: 1.46 on 1 and 17 DF, p-value: 0.2435
# ------
# end snippet
I want to generate an indicator variable to mark the observations used
in estimation: 1 for a row not deleted, 0 for a row deleted. In this
case I want an indicator variable that has seventeen 1s, one 0, and
then 2 1s. I know I can do ind = !is.na(group) in the above example.
But I am ideally looking for a way that allows one to use any formula
in lm, and still be able to mark the estimation sample.
Function/option I am missing? The best I could come up with:
> lm.D9 <- lm(weight ~ group, model=TRUE)
> ind <- as.numeric(row.names(lm.D9$model))
> esamp <- rep(0,length(group)) #substitute nrow(data.frame used in estimation) for length(group)
> esamp[ind] <- 1
> esamp
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1
Is this "safe" (recommended?)?
Appreciate any help.
Best, A
More information about the R-help
mailing list