[R] detection of outliers

Dimitris Rizopoulos dimitris.rizopoulos at med.kuleuven.ac.be
Thu Sep 23 16:57:23 CEST 2004

Hi Philippe,

you could consider using the Windsorized mean,

winds.mean <-  function(x, k=2){
    y <- x[!is.na(x)]
    mu <- mean(y)
    stdev <- sd(y)
    outliers.up <- y[y>mu+k*stdev]
    outliers.lo <- y[y<mu-k*stdev]
    y[y==outliers.up] <- mu+k*stdev
    y[y==outliers.lo] <- mu-k*stdev
    list(mean=sum(y)/length(y), outliers.up=outliers.up, 

x <- c(10,11,12,15,20,22,25,30,500)

I hope this helps.


Dimitris Rizopoulos
Ph.D. Student
Biostatistical Centre
School of Public Health
Catholic University of Leuven

Address: Kapucijnenvoer 35, Leuven, Belgium
Tel: +32/16/396887
Fax: +32/16/337015
Web: http://www.med.kuleuven.ac.be/biostat/

----- Original Message ----- 
From: <Phguardiol at aol.com>
To: <r-help at stat.math.ethz.ch>
Sent: Thursday, September 23, 2004 4:22 PM
Subject: [R] detection of outliers

> Hi,
> this is both a statistical and a R question...
> what would the best way / test to detect an outlier value among a 
> series of 10 to 30 values ? for instance if we have the following 
> dataset: 10,11,12,15,20,22,25,30,500 I d like to have a way to 
> identify the last data as an outlier (only one direction). One way 
> would be to calculate abs(mean - median) and if elevated (to what 
> extent ?) delete the extreme data then redo.. but is it valid to do 
> so with so few data ? is the (trimmed mean - mean) more efficient ? 
> if so, what would be the maximal tolerable value to use as a 
> threshold ? (I guess it will be experiment dependent...) tests for 
> skweness will probably required a larger dataset ?
> any suggestions are very welcome !
> thanks for your help
> Philippe Guardiola, MD
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html

More information about the R-help mailing list