[R] lm() and dffits
Dieter Menne
dieter.menne at menne-biomed.de
Sun Aug 31 22:07:41 CEST 2008
Ranney, Steven <steven.ranney <at> montana.edu> writes:
> 1) fit a simple lm(LW~LL)
> 2) calculate the dffits for those data points
> 3) remove those data points that are 2*sqrt(p/n) (where p=the number of
> parameters and n=number of data points; p=3 in a linear model, correct?
> Intercept, slope, and error term?)
> 4) rerun the model MINUS those data points
> 5) compare the two lm()
>
> Now, each of these steps I can do seperately, but only by outputting the
> dffits to a .csv then removing the large dffits by hand, reading the .csv
> back into R, rerunning the lm(), and comparing the first lm() to the second
> lm(). I would imagine that there is a better (easier, I hope!) way to doing
> all of this. Any ideas?
>
You could do the following:
# --------------------
x = rnorm(100)
y=rnorm(100)
y[40] = y[40]+30 # generate outliere
df = data.frame(x=x,y=y)
lmfit1 = lm(y~x, data=df) # fit all data
thresh = 3 # Choose any data-dependent threshold
nice = abs(dffits(lmfit)) < thresh
# note that nice[40] is the only FALSE
df2 = df[nice,]
lmfit2 = lm(y~x, data=df2)
summary(lmfit1)
summary(lmfit2)
# --------------------
However, this is a bit Denver-Style Home-Brewery. Instead of using this
ad-hoc method, you are probably better off using one of the robust methods, for
example in MASS.
Dieter
More information about the R-help
mailing list