[R] dixon test
giov
biowoman at libero.it
Wed Aug 13 11:59:32 CEST 2008
Hi,
thank you very much for your useful help =). just a question...I don't know
what is the distribution of my data (normal, T, etc...). So, how can I set
the type parameter? There is a type value to use in case of a
distribution-free statistical test?
Thank you so much!
Fernando Marmolejo-Ramos wrote:
>
> hi giov
>
> about the dixon test... i just run a simple test with a sample of 40 and I
> got:
>
> Error in dixon.test(x) : Sample size must be in range 3-30
>
> So it seems that most of the test in the "outliers" package are designed
> for small samples. See also the Rnews article published in May 2006 (vol
> 6/2) called "processing data for outliers" by Lukasz Komsta (the developer
> of the package).
>
> However there is in that package a function called "scores" which works
> for big samples. You can also see the p-values and z scores for the
> observations you have and determine which values are considered outliers.
>
> Try this simple syntax:
>
> library(outliers)
> library(gamlss.dist)
>
> # this produces a exponential+Gaussian distribution (which usually has
> heaps of outliers!)
> x <- rexGAUS(100,2000,3000,5000)
>
> # this confirms that Dixon works for samples between 3 and 30!!!
> dixon.test(x)
>
> # just to see what the data set looks like and visually confirm the
> outliers
> boxplot(x, notch=T)
>
> # sort the scores in ascending order
> sort(x)
>
> # returns probability of each score (using z scores) to be an outlier in
> order
> sort(scores(x, type="z", prob=1))
>
> # determines which scores are considered outliers with a 95% confidence
> sort(scores(x, prob=0.95))
>
> The author points regarding the "prob" part...
>
> prob ---- If set, the corresponding p-values instead of scores are given.
> If value is set to 1, p-value are returned. Otherwise, a logical vector is
> formed, indicating which values are exceeding specified probability. In
> "z" and "mad" types, there is also possibility to set this value to zero,
> and then scores are confirmed to (n-1)/sqrt(n) value, according to
> Shiffler (1998). The "iqr" type does not support probabilities, but "lim"
> value can be specified.
>
> The reference of Shiffler is not as the one that appears in the help. It
> is this one:
>
> Schiffler, R.E (1988). Maximum Z scores and outliers. Am. Stat. 42, 1,
> 79-80.
>
> I hope this helps,
>
> Fernando
>
>
--
View this message in context: http://www.nabble.com/dixon-test-tp18940260p18960162.html
Sent from the R help mailing list archive at Nabble.com.
More information about the R-help
mailing list