[R] outlier

Prof Brian Ripley ripley at stats.ox.ac.uk
Tue Jun 17 23:30:44 CEST 2003


On Tue, 17 Jun 2003, kan Liu wrote:

> Hi, many thanks for your advice. I appreciate very
> much. Maybe I can make the question more clear: I want
> to evaluate the correlation between two variables: one
> is the actual outputs of a system, another is the
> predicted values of the outputs of the system using
> neural networks. When I made scatterplots in excel, I
> can get the linear equation and the corresponding
> R-squared. In the bottom of the page
> http://www.statsoftinc.com/textbook/stathome.html, it
> mentioned that sometimes outliers will affect
> correlation coefficient biasly. So I thought it might
> be worth to remove outlier before  calculating
> R-squared in R. It seems to be a bad idea according to
> your comments. 

Yes. That's the whole point of robust methods: compensate rather than 
reject.

> Now can you make comments on how to
> evaluate the performance of the neural network model
> in predicting the actual outputs?

If you are interested in correlation coefficients, use cov.rob. However,
this is predicted vs actual, and you probably do want to penalize bad
predictions, not reject them.  It's up to you to choose a suitable loss
function for your application.  In particular, if the predicted values
were always 1e-45 times the actual values minus 1e310, the correlation
would be one and the predictions would be derisory.

> 
> Kan 
> 
> --- Spencer Graves <spencer.graves at PDF.COM> wrote:
> > 	  It is also wise to make scatterplots, as shown by
> > the famous examples 
> > produced of 4 scatterplots with the same R^2, where
> > the first shows the 
> > standard ellipsoid pattern implied by the
> > assumptions while the other 
> > three indicate very clearly that the assumptions are
> > incorrect.  See 
> > Anscombe (1973) "Graphs in Statistical Analysis",
> > The American 
> > Statistician, 27: 17-22, reproduced in, e.g., du
> > Toit, Steyn and Stumpf 
> > (1986) Graphical Exploratory Data Analysis
> > (Springer).
> > 
> > hth.  spencer graves
> > 
> > Prof Brian Ripley wrote:
> > > On Tue, 17 Jun 2003, kan Liu wrote:
> > > 
> > > 
> > >> I want to calculate the R-squared between two
> > variables. Can you advice
> > >>me how to identify and remove the outliers before
> > performing R-squared
> > >>calculation?
> > > 
> > > 
> > > Easy: you don't.  It make no sense to consider R^2
> > after arbitrary outlier 
> > > removal: if I remove all but two points I get R^2
> > = 1!
> > > 
> > > R^2 is normally used to measure the success of a
> > multiple regression, but 
> > > as you mention two variables, did you just mean
> > the Pearson 
> > > product-moment correlation?  It makes more sense
> > to use a robust measure 
> > > of correlation, as in cov.rob (package lqs) or
> > even Spearman or Kendall 
> > > measures (cov.test in package ctest).
> > > 
> > > If you intended to do this for a multiple
> > regression, you need to do some 
> > > sort of robust regression and a use a robust
> > measure of fit.
> > > 
> > 
> > 
> 
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
> 

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595




More information about the R-help mailing list