[R] Need to find most likely betas

Mon Feb 19 19:32:21 CET 2007

Hi Mark,

In my example, there has been a regime change at time 25 and I'd like
to find a way to discover 1- what has changed and 2- when it did.

The problem is all that is observed are the x and y values.
unknownbetas are... unknown. If you look at x and y, you can't really
tell something has changed.

It is not an outlier per se as it involves a change of one of the unknownbetas.

In other words, I'm trying to single out which unknownbetas vs. x
relationships still hold after time 25.

I know it's complicated, but I you have any pointers, it will be appreciated.

Thanks

On 2/19/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> On Mon, 2007-02-19 at 09:58 -0500, Pierre Lapointe wrote:
> > Hello,
> >
> > I have a particular situation where a single "wrong" observation is
> > impacting the results of a traditional regression to the point that
> > betas become unreliable.  I need a way to calculate the most likely
> > betas.  Here's an example:
> >
> > set.seed(1)
> > unknownbeta <- matrix(seq(100,500,100),25,5,byrow=TRUE)
> > x <-matrix(runif(25*5),25)
> > y <- rowSums(unknownbeta*x)
> > summary(lm(y~0+x)) #gets back the unknown betas.
> >
> > #Now, let's introduce a single wrong data.
> >
> > unknownbeta[25,5] <-100
> > y <- rowSums(unknownbeta*x)
> > summary(lm(y~0+x)) #every beta changes.
> >
> > I need to find out what are the most likely betas in the second
> > example.  There is no obvious way to know that row 25 has wrong input.
> > I would even be happy if the conclusion was that x1:x4 are 100, 200,
> > 300 and 400 and that x5 is zero.
> >
> > Thanks
>
> It is not clear what you mean by a "wrong" observation.  Is the data
> completely bad because it was improperly collected?  Is this an
> observation that has correct data, but is an "outlier" relative to the
> other observations? Is the observation missing data, where values can be
> reasonably imputed?
>
> Are you in a setting where the observation MUST be included in the
> regression rather than be deleted? For example an "Intent to Treat"
> analysis in a clinical trial?
>
> Depending upon the context, your options may range from simply removing
> the single observation from the regression, considering some form of
> weighting of the observations, to perhaps considering a robust
> regression methodology and others.
>
> This is not strictly an R question, but one of methodology.
> Clarification of which is potentially impacted upon by "community"
> standards and prior work within your particular discipline.
>
> HTH,
>
> Marc Schwartz
>
>
>