[R] {car} outlierTest looses p/q values

John Fox jfox at mcmaster.ca
Fri Oct 17 20:54:37 CEST 2014


Dear Phil,

Yes, that's a bit clearer. One can invent data configurations where certain studentized residuals are undefined. For example, try the following:

y <- c(0, 0, 0, 0, 0, 1)
x <- 1:6
xx <- (1:6 - 3.5)^2
rstudent(lm(y ~ x))
rstudent(lm(y ~ xx))
plot(x, y)
plot(xx, y)

The plots should clarify what's going on.

I'm copying to r-help since the discussion began there.

I hope this helps,
 John

On Fri, 17 Oct 2014 18:35:59 +0200
 Philip Stevens <chobophil at gmail.com> wrote:
> Dear John,
> 
> Thank you for your fast reply. Unfortunately I am out of office right now
> but I will get back to you on Monday asap with a toy example and some code.
> Meanwhile let me try to explain further.
> 
> Basically not the glm but the outlierTest afterwards fails. And it only
> fails if all values, used to set up the glm, are exactly 1 except for one
> value which can be arbitrary large. I construct the glm from a vector of
> doubles (the target values) and a vector of other numeric values(metadata).
> (So this should be fit <- glm(targetVector ~ metadata) ) And want to check
> if one of those doubles(in the target vector) is an outlier using
> outlierTest(fit). The residuals are calculated by the glm but the
> outlierTest does not report a p nor a q value for the one value in the
> target vector which is not 1. And I can't figure out why...
> 
> I hope this makes it a bit clearer.
> Anyways I will come back to you on Monday.
> 
> Best,
> 
> Phil
> Am 17.10.2014 17:43 schrieb "John Fox" <jfox at mcmaster.ca>:
> 
> > Dear Phil,
> >
> > After reading your posting several times, I still don't understand what
> > you did. As usual, having a reproducible example illustrating the error
> > would be a great help. I do have a guess about the source of the error:
> > glm() failed in some way for the problematic case.
> >
> > Best,
> >  John
> >
> > ------------------------------------------------
> > John Fox, Professor
> > McMaster University
> > Hamilton, Ontario, Canada
> > http://socserv.mcmaster.ca/jfox/
> >
> >
> > On Fri, 17 Oct 2014 12:53:30 +0200
> >  Phil <chobophil at gmail.com> wrote:
> > > Hi guys,
> > >
> > > I came across a strange phenomena and can't figure out why it happens by
> > myself so here we go.
> > >
> > > I got a dataframe which consists of double numbers which I want to
> > check, row-wise if there are outliers in the rows.
> > >
> > > So I iterate over the rows and create a glm using the numbers of that
> > particular row. Which might look like this:
> > >
> > > case1)
> > >          x1        x2        x3        x4        x5        x6 x7
> > > x8        x9        x10        x11
> > >      0.00     3.91     0.00     0.00     0.00   68.03   40.39 0.00
> > > 0.00      0.00       4.11
> > >
> > > or like this:
> > > case2)
> > >          x1        x2        x3        x4        x5        x6 x7
> > > x8        x9        x10        x11
> > >       1.00     1.00    1.00     1.00     1.00     1.00     1.00 1.00
> > > 1.00     1.00      5.34
> > >
> > > or any other combination of double numbers...
> > >
> > > however, using a glm like this:
> > >
> > > glModel <- glm(vector ~ some_other_meta_data_which_is_double_numbers)
> > >
> > > and testing it with:
> > >
> > >   test.Res <- outlierTest(glModel,digits=4,cutoff=Inf,n.max=Inf)
> > >
> > > I always get a result consisting of the desired p and q values but not
> > if the vector I use looks like case2. There is no error message and the
> > computation does not stop either.
> > > However, all p and q values are produced except for the last value x11.
> > >
> > > Any idea why this particular value gets dropped from the output of the
> > outlierTest Method in the car package.
> > >
> > > Here is the sessioninfo:
> > >
> > >   sessionInfo()
> > > R version 3.1.1 (2014-07-10)
> > > Platform: x86_64-redhat-linux-gnu (64-bit)
> > >
> > > locale:
> > >   [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
> > >   [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
> > >   [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8
> > >   [7] LC_PAPER=en_US.utf8       LC_NAME=C
> > >   [9] LC_ADDRESS=C              LC_TELEPHONE=C
> > > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C
> > >
> > > attached base packages:
> > > [1] stats     graphics  grDevices utils     datasets  methods base
> > >
> > > other attached packages:
> > > [1] ggplot2_1.0.0      car_2.0-21         RColorBrewer_1.0-5 iNEXT_1.0
> > > [5] vegan_2.0-10       lattice_0.20-29    permute_0.8-3
> > >
> > > loaded via a namespace (and not attached):
> > >   [1] colorspace_1.2-4 compiler_3.1.1   digest_0.6.4 grid_3.1.1
> > >   [5] gtable_0.1.2     labeling_0.3     MASS_7.3-33 munsell_0.4.2
> > >   [9] nnet_7.3-8       plyr_1.8.1       proto_0.3-10 Rcpp_0.11.2
> > > [13] reshape2_1.4     scales_0.2.4     stringr_0.6.2    tools_3.1.1
> > >
> > > Any help is highly appreciated.
> > >
> > > Thanks
> > >
> > > Phil
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> >
> >
> >
> >
> >

------------------------------------------------
John Fox, Professor
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/



More information about the R-help mailing list