[R] detection of outliers
Dimitris Rizopoulos
dimitris.rizopoulos at med.kuleuven.ac.be
Thu Sep 23 16:57:23 CEST 2004
Hi Philippe,
you could consider using the Windsorized mean,
winds.mean <- function(x, k=2){
y <- x[!is.na(x)]
mu <- mean(y)
stdev <- sd(y)
outliers.up <- y[y>mu+k*stdev]
outliers.lo <- y[y<mu-k*stdev]
y[y==outliers.up] <- mu+k*stdev
y[y==outliers.lo] <- mu-k*stdev
list(mean=sum(y)/length(y), outliers.up=outliers.up,
outliers.lo=outliers.lo)
}
##################
x <- c(10,11,12,15,20,22,25,30,500)
mean(x)
winds.mean(x)
I hope this helps.
Best,
Dimitris
> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value among a
> series of 10 to 30 values ? for instance if we have the following
> dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to
> identify the last data as an outlier (only one direction). One way
> would be to calculate abs(mean - median) and if elevated (to what
> extent ?) delete the extreme data then redo.. but is it valid to do
> so with so few data ? is the (trimmed mean - mean) more efficient ?
> if so, what would be the maximal tolerable value to use as a
> threshold ? (I guess it will be experiment dependent...) tests for
> skweness will probably required a larger dataset ?
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
>
