[R] stat question - not R question so ignore if not interested

Michael Kubovy kubovy at virginia.edu
Tue Dec 5 23:21:00 CET 2006


On Dec 5, 2006, at 3:42 PM, Leeds, Mark ((IED)) wrote:

> If do a scattrplot of data ( x and y ) and there are two clouds of
> points. One cloud is in the left
> bottom corner of the plot and the other cloud is in the upper right.
>
> If I fit a regression line to this data ( or equivalently ,  
> calculate a
> correlation ), then obviously, it is going to seem like
> x and y are related because a line has to be connected between the 2
> clouds. But, there must be a regression assumption that
> is violated here because if the regressions are done separately on  
> each
> cloud, then there really isn't
> a relationship between x and y. I was just wondering 1) what  
> assumption
> in regression is being violated in
> the first case or 2) possibly if the regression is valid and the  
> results
> just have some different interpreation ?

One needs only to look at diagnostic plots:

Suppose
set.seed(2)
xy <- data.frame(y = c(rnorm(300), rnorm(300, 5)), x = c(rnorm(300),  
rnorm(300, 5)))
op <- par(mfrow = c(2,2))
plot(lm(y ~ x, xy))
par(op)

The model does not fit well because the residuals aren't flat as a  
function of fit and because homoscedasticity is violated.

When this happens we might try a different approach:
require(sm)
xy.sm <- sm.regression(xy$x, xy$y)

Whenever there's a big discrepancy between an OLS fit and a robust  
one, we should not pursue the OLS one w/o reinterpretation, which  
others have discussed in their replies.
_____________________________
Professor Michael Kubovy
University of Virginia
Department of Psychology
USPS:     P.O.Box 400400    Charlottesville, VA 22904-4400
Parcels:    Room 102        Gilmer Hall
         McCormick Road    Charlottesville, VA 22903
Office:    B011    +1-434-982-4729
Lab:        B019    +1-434-982-4751
Fax:        +1-434-982-4766
WWW:    http://www.people.virginia.edu/~mk9y/




More information about the R-help mailing list