[R] about a p-value < 2.2e-16

Fri Mar 19 16:41:33 CET 2021

Hi Spencer,

Thanks for your test results, I do not know the answer as I haven't
used wilcox.test for many years. I do not know if it is possible to compute
the exact distribution of the Wilcoxon rank sum statistic, but I think it
is very likely, as the document of `Wilcoxon` says:

This distribution is obtained as follows. Let x and y be two random,
independent samples of size m and n. Then the Wilcoxon rank sum statistic
is the number of all pairs (x[i], y[j]) for which y[j] is not greater than
x[i]. This statistic takes values between 0 and m * n, and its mean and
variance are m * n / 2 and m * n * (m + n + 1) / 12, respectively.

As a nice feature of the non-parametric statistic, it is usually
distribution-free so you can pick any distribution you like to compute the
same statistic. I wonder if this is the case, but I might be wrong.

Cheers,
Jiefei

On Fri, Mar 19, 2021 at 10:57 PM Spencer Graves <
spencer.graves using effectivedefense.org> wrote:

>
>
> On 2021-3-19 9:52 AM, Jiefei Wang wrote:
> > After digging into the R source, it turns out that the argument `exact`
> has
> > nothing to do with the numeric precision. It only affects the statistic
> > model used to compute the p-value. When `exact=TRUE` the true
> distribution
> > of the statistic will be used. Otherwise, a normal approximation will be
> > used.
> >
> > I think the documentation needs to be improved here, you can compute the
> > exact p-value *only* when you do not have any ties in your data. If you
> > have ties in your data you will get the p-value from the normal
> > approximation no matter what value you put in `exact`. This behavior
> should
> > be documented or a warning should be given when `exact=TRUE` and ties
> > present.
> >
> > FYI, if the exact p-value is required, `pwilcox` function will be used to
> > compute the p-value. There are no details on how it computes the pvalue
> but
> > its C code seems to compute the probability table, so I assume it
> computes
> > the exact p-value from the true distribution of the statistic, not a
> > permutation or MC p-value.
>
>
>        My example shows that it does NOT use Monte Carlo, because
> otherwise it uses some distribution.  I believe the term "exact" means
> that it uses the permutation distribution, though I could be mistaken.
> If it's NOT a permutation distribution, I don't know what it is.
>
>
>        Spencer
> >
> > Best,
> > Jiefei
> >
> >
> >
> > On Fri, Mar 19, 2021 at 10:01 PM Jiefei Wang <szwjf08 using gmail.com> wrote:
> >
> >> Hey,
> >>
> >> I just want to point out that the word "exact" has two meanings. It can
> >> mean the numerically accurate p-value as Bogdan asked in his first
> email,
> >> or it could mean the p-value calculated from the exact distribution of
> the
> >> statistic(In this case, U stat). These two are actually not related,
> even
> >> though they all called "exact".
> >>
> >> Best,
> >> Jiefei
> >>
> >> On Fri, Mar 19, 2021 at 9:31 PM Spencer Graves <
> >> spencer.graves using effectivedefense.org> wrote:
> >>
> >>>
> >>> On 2021-3-19 12:54 AM, Bogdan Tanasa wrote:
> >>>> thanks a lot, Vivek ! in other words, assuming that we work with 1000
> >>> data
> >>>> points,
> >>>>
> >>>> shall we use EXACT = TRUE, it uses the normal approximation,
> >>>>
> >>>> while if EXACT=FALSE (for these large samples), it does not ?
> >>>
> >>>         As David Winsemius noted, the documentation is not clear.
> >>> Consider the following:
> >>>
> >>>> set.seed(1)  > x <- rnorm(100) > y <- rnorm(100, 2) > > wilcox.test(x,
> >>> y)$p.value
> >>> [1] 1.172189e-25 > wilcox.test(x, y)$p.value [1] 1.172189e-25 > >
> >>> wilcox.test(x, y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x,
> >>> y, EXACT=TRUE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=TRUE)$p.value [1] 4.123875e-32 > wilcox.test(x, y,
> >>> exact=TRUE)$p.value [1] 4.123875e-32 > > wilcox.test(x, y,
> >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> EXACT=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=FALSE)$p.value [1] 1.172189e-25 > wilcox.test(x, y,
> >>> exact=FALSE)$p.value [1] 1.172189e-25 > We get two values here:
> >>> 1.172189e-25 and 4.123875e-32. The first one, I think, is the normal
> >>> approximation, which is the same as exact=FALSE. I think that with
> >>> exact=FALSE, you get a permutation distribution, though I'm not sure.
> >>> You might try looking at "wilcox_test in package coin for exact,
> >>> asymptotic and Monte Carlo conditional p-values, including in the
> >>> presence of ties" to see if it is clearer. NOTE: R is case sensitive,
> so
> >>> "EXACT" is a different variable from "exact". It is interpreted as an
> >>> optional argument, which is not recognized and therefore ignored in
> this
> >>> context.
> >>>            Hope this helps.
> >>>            Spencer
> >>>
> >>>
> >>>> On Thu, Mar 18, 2021 at 10:47 PM Vivek Das <vd4mmind using gmail.com>
> wrote:
> >>>>
> >>>>> Hi Bogdan,
> >>>>>
> >>>>> You can also get the information from the link of the Wilcox.test
> >>> function
> >>>>> page.
> >>>>>
> >>>>> “By default (if exact is not specified), an exact p-value is computed
> >>> if
> >>>>> the samples contain less than 50 finite values and there are no ties.
> >>>>> Otherwise, a normal approximation is used.”
> >>>>>
> >>>>> For more:
> >>>>>
> >>>>>
> >>>
> https://stat.ethz.ch/R-manual/R-devel/library/stats/html/wilcox.test.html
> >>>>> Hope this helps!
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> VD
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 18, 2021 at 10:36 PM Bogdan Tanasa <tanasa using gmail.com>
> >>> wrote:
> >>>>>> Dear Peter, thanks a lot. yes, we can see a very precise p-value,
> and
> >>> that
> >>>>>> was the request from the journal.
> >>>>>>
> >>>>>> if I may ask another question please : what is the meaning of
> >>> "exact=TRUE"
> >>>>>> or "exact=FALSE" in wilcox.test ?
> >>>>>>
> >>>>>> i can see that the "numerically precise" p-values are different.
> >>> thanks a
> >>>>>> lot !
> >>>>>>
> >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>> tst$p.value
> >>>>>> [1] 8.535524e-25
> >>>>>>
> >>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=FALSE)
> >>>>>> tst$p.value
> >>>>>> [1] 3.448211e-25
> >>>>>>
> >>>>>> On Thu, Mar 18, 2021 at 10:15 PM Peter Langfelder <
> >>>>>> peter.langfelder using gmail.com> wrote:
> >>>>>>
> >>>>>>> I thinnk the answer is much simpler. The print method for
> hypothesis
> >>>>>>> tests (class htest) truncates the p-values. In the above example,
> >>>>>>> instead of using
> >>>>>>>
> >>>>>>> wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>
> >>>>>>> and copying the output, just print the p-value:
> >>>>>>>
> >>>>>>> tst = wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>> tst$p.value
> >>>>>>>
> >>>>>>> [1] 2.988368e-32
> >>>>>>>
> >>>>>>>
> >>>>>>> I think this value is what the journal asks for.
> >>>>>>>
> >>>>>>> HTH,
> >>>>>>>
> >>>>>>> Peter
> >>>>>>>
> >>>>>>> On Thu, Mar 18, 2021 at 10:05 PM Spencer Graves
> >>>>>>> <spencer.graves using effectivedefense.org> wrote:
> >>>>>>>>          I would push back on that from two perspectives:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>                1.  I would study exactly what the journal said
> very
> >>>>>>>> carefully.  If they mandated "wilcox.test", that function has an
> >>>>>>>> argument called "exact".  If that's what they are asking, then
> using
> >>>>>>>> that argument gives the exact p-value, e.g.:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>    > wilcox.test(rnorm(100), rnorm(100, 2), exact=TRUE)
> >>>>>>>>
> >>>>>>>>            Wilcoxon rank sum exact test
> >>>>>>>>
> >>>>>>>> data:  rnorm(100) and rnorm(100, 2)
> >>>>>>>> W = 691, p-value < 2.2e-16
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>                2.  If that's NOT what they are asking, then I'm
> not
> >>>>>>>> convinced what they are asking makes sense:  There is is no such
> >>> thing
> >>>>>>>> as an "exact p value" except to the extent that certain
> assumptions
> >>>>>>>> hold, and all models are wrong (but some are useful), as George
> Box
> >>>>>>>> famously said years ago.[1]  Truth only exists in mathematics, and
> >>>>>>>> that's because it's a fiction to start with ;-)
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>          Hope this helps.
> >>>>>>>>          Spencer Graves
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> [1]
> >>>>>>>> https://en.wikipedia.org/wiki/All_models_are_wrong
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On 2021-3-18 11:12 PM, Bogdan Tanasa wrote:
> >>>>>>>>>     <
> >>>
> https://meta.stackexchange.com/questions/362285/about-a-p-value-2-2e-16
> >>>>>>>>> Dear all,
> >>>>>>>>>
> >>>>>>>>> i would appreciate having your advice on the following please :
> >>>>>>>>>
> >>>>>>>>> in R, the wilcox.test() provides "a p-value < 2.2e-16", when we
> >>>>>> compare
> >>>>>>>>> sets of 1000 genes expression (in the genomics field).
> >>>>>>>>>
> >>>>>>>>> however, the journal asks us to provide the exact p value ...
> >>>>>>>>>
> >>>>>>>>> would it be legitimate to write : "p-value = 0" ? thanks a lot,
> >>>>>>>>>
> >>>>>>>>> -- bogdan
> >>>>>>>>>
> >>>>>>>>>         [[alternative HTML version deleted]]
> >>>>>>>>>
> >>>>>>>>> ______________________________________________
> >>>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more,
> see
> >>>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>>> and provide commented, minimal, self-contained, reproducible
> code.
> >>>>>>>> ______________________________________________
> >>>>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>>>> PLEASE do read the posting guide
> >>>>>>> http://www.R-project.org/posting-guide.html
> >>>>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>           [[alternative HTML version deleted]]
> >>>>>>
> >>>>>> ______________________________________________
> >>>>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>>>> PLEASE do read the posting guide
> >>>>>> http://www.R-project.org/posting-guide.html
> >>>>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>>>
> >>>>> --
> >>>>> ----------------------------------------------------------
> >>>>>
> >>>>> Vivek Das, PhD
> >>>>>
> >>>>        [[alternative HTML version deleted]]
> >>>>
> >>>> ______________________________________________
> >>>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>          [[alternative HTML version deleted]]
> >>>
> >>> ______________________________________________
> >>> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
>
>

	[[alternative HTML version deleted]]