[R] Need to find most likely betas

Mon Feb 19 19:58:42 CET 2007

Pierre,

Unfortunately, I don't have much in the way of "hands on" experience
with these, but conceptually, latent variable analysis/SEM methods seem
like they might be apropos. If so, John Fox' SEM package might be of
value here. More information here:

http://socserv.mcmaster.ca/jfox/Misc/sem/index.html

Since you mention time, if this is a repeated measures based approach,
then you might want to look at lmer(), for which there is a recently
created SIG list. More information is at:

https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models

Perhaps others will jump in with additional thoughts.

HTH,

Marc

On Mon, 2007-02-19 at 13:32 -0500, Pierre Lapointe wrote:
> Hi Mark,
> 
> In my example, there has been a regime change at time 25 and I'd like
> to find a way to discover 1- what has changed and 2- when it did.
> 
> The problem is all that is observed are the x and y values.
> unknownbetas are... unknown. If you look at x and y, you can't really
> tell something has changed.
> 
> It is not an outlier per se as it involves a change of one of the unknownbetas.
> 
> In other words, I'm trying to single out which unknownbetas vs. x
> relationships still hold after time 25.
> 
> I know it's complicated, but I you have any pointers, it will be appreciated.
> 
> Thanks
> 
> On 2/19/07, Marc Schwartz <marc_schwartz at comcast.net> wrote:
> > On Mon, 2007-02-19 at 09:58 -0500, Pierre Lapointe wrote:
> > > Hello,
> > >
> > > I have a particular situation where a single "wrong" observation is
> > > impacting the results of a traditional regression to the point that
> > > betas become unreliable.  I need a way to calculate the most likely
> > > betas.  Here's an example:
> > >
> > > set.seed(1)
> > > unknownbeta <- matrix(seq(100,500,100),25,5,byrow=TRUE)
> > > x <-matrix(runif(25*5),25)
> > > y <- rowSums(unknownbeta*x)
> > > summary(lm(y~0+x)) #gets back the unknown betas.
> > >
> > > #Now, let's introduce a single wrong data.
> > >
> > > unknownbeta[25,5] <-100
> > > y <- rowSums(unknownbeta*x)
> > > summary(lm(y~0+x)) #every beta changes.
> > >
> > > I need to find out what are the most likely betas in the second
> > > example.  There is no obvious way to know that row 25 has wrong input.
> > > I would even be happy if the conclusion was that x1:x4 are 100, 200,
> > > 300 and 400 and that x5 is zero.
> > >
> > > Thanks
> >
> > It is not clear what you mean by a "wrong" observation.  Is the data
> > completely bad because it was improperly collected?  Is this an
> > observation that has correct data, but is an "outlier" relative to the
> > other observations? Is the observation missing data, where values can be
> > reasonably imputed?
> >
> > Are you in a setting where the observation MUST be included in the
> > regression rather than be deleted? For example an "Intent to Treat"
> > analysis in a clinical trial?
> >
> > Depending upon the context, your options may range from simply removing
> > the single observation from the regression, considering some form of
> > weighting of the observations, to perhaps considering a robust
> > regression methodology and others.
> >
> > This is not strictly an R question, but one of methodology.
> > Clarification of which is potentially impacted upon by "community"
> > standards and prior work within your particular discipline.
> >
> > HTH,
> >
> > Marc Schwartz
> >
> >
> >