[R] linear correlation?

Andrew Perrin andrew_perrin at unc.edu
Fri Mar 8 15:22:26 CET 2002

On Fri, 8 Mar 2002, [iso-8859-1] dechao wang wrote:

> Many thanks for all who have joined the discussion.
> From the instructive discussion, it seems there may
> not have a command or function to deal with DIRECTLY
> the comparison between two items, such as 
> x1<-c(weight1, ...weightn, height1,...heightn)
> x2<-c(weight1, ...weightn, height1,...heightn)

It's not the lack of a command, it's the question of a *method*. There
are, as I said before, fields of statistics dedicated to measuring the
similarity/difference between vectors of measures. Chapters 10 and 11 in
Venables and Ripley's _Modern Applied Statistics with S-Plus_ might be a
good place for you to start. But it's most definitely *not* the right idea
to simply decide that cor() sounds like a nice command so you'll use it,
regardless of whether it has any validity.

> However, this may be quite a common question in the
> real world to be asked. As we have already seen that
> correlation analysis could be used to address this
> issue, except that the resolution rate is not good. 
> According to the theory of gray systems, several
> measures can be taken to increase the compatibility of
> different items which contain different units and
> measurements. Take trees for example, after the data
> were normalised, the relation degree between tree1 and
> tree2 is 0.9997, while the relation degree between
> tree1 and tree3 is 0.4988.

Well, if you'd like to define the term "relation degree" to mean "the
meaningless correlation between the various measures across cases" or
something to that effect, you're free to do so. But you'd need to make a
case that the correlation coefficient between trees is a statistically
appropriate way to measure the degree of similarity between cases, which
is what you're really asking. Given that lots of people have worked lots
of years to develop appropriate methods for measuring similarity between
cases, I suspect your research using this measure would be, er... poorly

> > tree1<-c(1,  2,  3,   100, 200, 300)
> > tree2<-c(1.1,2.8,3.3, 108, 209, 303)
> > tree3<-c(3.8,6.8,5.3, 108, 209, 303)
> > trees<-cbind(tree1,tree2,tree3)
> > cor(trees,trees)
>          tree1     tree2     tree3
> tree1 1.000000 0.9996549 0.9997620
> tree2 0.999655 1.0000000 0.9999687
> tree3 0.999762 0.9999687 1.0000000
> > 
> > tree1<-c(tree1[1:3]/6.8, tree1[4:6]/303)
> > tree2<-c(tree2[1:3]/6.8, tree2[4:6]/303)
> > tree3<-c(tree3[1:3]/6.8, tree3[4:6]/303)
> > trees<-cbind(tree1,tree2,tree3)
> > cor(trees,trees)
>           tree1     tree2     tree3
> tree1 1.0000000 0.9918951 0.4988191
> tree2 0.9918951 1.0000000 0.5806924
> tree3 0.4988191 0.5806924 1.0000000

All you've shown here is that it's possible to calculate a
correlation. But R won't tell you whether it's a good idea or not -- R's
job is to calculate. It assumes you know what you're doing. In this case,
I submit, you do not. The fact that a correlation coefficient can be
calculated does not mean that it says anything at all about the similarity
between cases, which is your real question here.  As an example, I give
you Joe and Jane, two rather different individuals; Joe is 75 inches tall
and weights 90kg. He has two eyes, and is 45 years old. Jane, by contrast,
is only 48 inches tall, and weighs only 57kg. She, too, has two eyes, but
is only 32 years old.

> people
     Weight Height NrEyes Age
joe      75   90.0      2  45
jane     48   57.6      2  32

Nevertheless, according to your metric, they are very similar:
> cor(joe,jane)
[1] 0.998295

My strong advice is that you give up on trying to use correlation
coefficients to measure the similarity between cases and consider methods
that are actually suited to that task.

Andrew J Perrin - andrew_perrin at unc.edu - http://www.unc.edu/~aperrin
 Assistant Professor of Sociology, U of North Carolina, Chapel Hill
      269 Hamilton Hall, CB#3210, Chapel Hill, NC 27599-3210 USA

r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at stat.math.ethz.ch

More information about the R-help mailing list