[R] (no subject)
andrew collier
digitalpenis at bluebottle.com
Wed Nov 7 18:23:55 CET 2007
hello,
i am a bit of a statistical neophyte and currently trying to make some sense of confidence intervals for correlation coefficients. i am using the cor.test() function. the documentation is quite terse and i am having trouble tieing up the output from this function with stuff that i have read in the literature. so, for example, i make two sequences and calculate the correlation coefficient:
> x <- runif(20)
> y <- jitter(x, amount = 0.7)
> cor(x, y)
[1] 0.5198252
now i want to establish that confidence i can attach to this value. from the table i retrieved from the article "Understanding Correlation" by r. j. rummel [online] i get that the probability of a correlation coefficient of 0.5198252 arising by chance from two sequences of length 20 is less than 0.01. so this seems like i can attach some significance to the result. i still don't understand where the table comes from and it only goes up as far as sequences of length 1000. the data i am wanting to analyse has length of more than 70000, so i need to calculate these confidence levels myself. i assume that cor.test() is the way to do this. so i tried:
> cor.test(x, y, "greater", conf.level = 0.95)
Pearson's product-moment correlation
data: x and y
t = 2.5816, df = 18, p-value = 0.009405
alternative hypothesis: true correlation is greater than 0
95 percent confidence interval:
0.1753340 1.0000000
sample estimates:
cor
0.5198252
> cor.test(x, y, "less", conf.level = 0.95)
Pearson's product-moment correlation
data: x and y
t = 2.5816, df = 18, p-value = 0.9906
alternative hypothesis: true correlation is less than 0
95 percent confidence interval:
-1.0000000 0.7509089
sample estimates:
cor
0.5198252
> cor.test(x, y, "two.sided", conf.level = 0.95)
Pearson's product-moment correlation
data: x and y
t = 2.5816, df = 18, p-value = 0.01881
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1003997 0.7823738
sample estimates:
cor
0.5198252
i reckon that the first invocation of the function is closest to what i am looking for. now the rest of the output from the function is a total mystery to me. could anyone please tell me:
o what is a p-value?
o how to interpret the quoted confidence interval?
i do see that as i increase the conf.level input parameter to cov.test() the lower bound of the confidence interval gets lower:
0.95 -> 0.1753340 1.0000000
0.975 -> 0.1003997 1.0000000
0.995 -> -0.04859184 1.00000000
does this mean that with 99.5% certainty the correlation coefficient should lie in the range -0.04859184 to 1.00000000? hmmm. i am doubtful. plus this doesn't really answer my question, which is more about what confidence i can assign to the measured correlation coefficient (0.5198252).
an alternative question would be: given two sequences and a calculated correlation coefficient, with what probability could i assert that the underlying processes are indeed correlated and that the calculated correlation coefficient does not simply arise by chance.
please forgive my ignorance. any help will be vastly appreciated. thanks!
best regards,
andrew.
----------------------------------------------------------------------
Get a free email account with anti spam protection.
http://www.bluebottle.com/tag/2
More information about the R-help
mailing list