[R] Basic statistic (Was: (no subject))
Petr PIKAL
petr.pikal at precheza.cz
Thu Nov 8 12:07:20 CET 2007
Hi
r-help-bounces at r-project.org napsal dne 07.11.2007 18:23:55:
> hello,
>
> i am a bit of a statistical neophyte and currently trying to make some
sense
> of confidence intervals for correlation coefficients. i am using the
cor.
> test() function. the documentation is quite terse and i am having
trouble
> tieing up the output from this function with stuff that i have read in
the
> literature. so, for example, i make two sequences and calculate the
> correlation coefficient:
>
> > x <- runif(20)
> > y <- jitter(x, amount = 0.7)
> > cor(x, y)
> [1] 0.5198252
>
> now i want to establish that confidence i can attach to this value. from
the
> table i retrieved from the article "Understanding Correlation" by r. j.
rummel
> [online] i get that the probability of a correlation coefficient of
0.5198252
> arising by chance from two sequences of length 20 is less than 0.01. so
this
> seems like i can attach some significance to the result. i still don't
> understand where the table comes from and it only goes up as far as
sequences
> of length 1000. the data i am wanting to analyse has length of more than
> 70000, so i need to calculate these confidence levels myself. i assume
that
> cor.test() is the way to do this. so i tried:
You shall consult some basic statistic textbooks. Some of them you can
find in CRAN recommended literature but much is explained in output.
>
> > cor.test(x, y, "greater", conf.level = 0.95)
>
> Pearson's product-moment correlation
>
> data: x and y
> t = 2.5816, df = 18, p-value = 0.009405
^^^^^^^^^
Here is your 0.01 value getting this cor coeficient by chance
> alternative hypothesis: true correlation is greater than 0
positive correlation
> 95 percent confidence interval:
> 0.1753340 1.0000000
confidence interval for correlation coeficient
> sample estimates:
> cor
> 0.5198252
>
> > cor.test(x, y, "less", conf.level = 0.95)
>
> Pearson's product-moment correlation
>
> data: x and y
> t = 2.5816, df = 18, p-value = 0.9906
> alternative hypothesis: true correlation is less than 0
negative correlation
> 95 percent confidence interval:
> -1.0000000 0.7509089
> sample estimates:
> cor
> 0.5198252
>
> > cor.test(x, y, "two.sided", conf.level = 0.95)
>
> Pearson's product-moment correlation
>
> data: x and y
> t = 2.5816, df = 18, p-value = 0.01881
> alternative hypothesis: true correlation is not equal to 0
any type of correlation
> 95 percent confidence interval:
> 0.1003997 0.7823738
> sample estimates:
> cor
> 0.5198252
>
> i reckon that the first invocation of the function is closest to what i
am
> looking for. now the rest of the output from the function is a total
mystery
> to me. could anyone please tell me:
>
> o what is a p-value?
Wikipedia says
In statistical hypothesis testing, the p-value is the probability of
obtaining a result at least as extreme as a given data point, assuming the
data point was the result of chance alone. The fact that p-values are
based on this assumption is crucial to their correct interpretation
> o how to interpret the quoted confidence interval?
>
> i do see that as i increase the conf.level input parameter to cov.test()
the
> lower bound of the confidence interval gets lower:
>
> 0.95 -> 0.1753340 1.0000000
> 0.975 -> 0.1003997 1.0000000
> 0.995 -> -0.04859184 1.00000000
>
> does this mean that with 99.5% certainty the correlation coefficient
should
> lie in the range -0.04859184 to 1.00000000? hmmm. i am doubtful. plus
this
> doesn't really answer my question, which is more about what confidence i
can
> assign to the measured correlation coefficient (0.5198252).
Why not. Those figures are really what they seems to be. In first case the
true correlation coeficient lies between 0.17 and 1 based on data and
assumption of positice correlation with 95% probability. If you want to
increase the probability for true coeficient to be in some interval you
need to expand your interval (and if you want to be 100% sure you need to
expand it infinitelly :-).
Regards
Petr
>
> an alternative question would be: given two sequences and a calculated
> correlation coefficient, with what probability could i assert that the
> underlying processes are indeed correlated and that the calculated
correlation
> coefficient does not simply arise by chance.
>
> please forgive my ignorance. any help will be vastly appreciated.
thanks!
>
> best regards,
> andrew.
>
> ----------------------------------------------------------------------
> Get a free email account with anti spam protection.
> http://www.bluebottle.com/tag/2
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list