[R] A doubt about "lm" and the meaning of its summary

Tue Aug 19 04:19:46 CEST 2008

Hi Alberto,

Please disregard my previous note - I probably had a black-out!!!

--- On Tue, 19/8/08, Alberto Monteiro <albmont at centroin.com.br> wrote:

> From: Alberto Monteiro <albmont at centroin.com.br>
> Subject: [R] A doubt about "lm" and the meaning of its summary
> To: r-help at r-project.org
> Received: Tuesday, 19 August, 2008, 4:31 AM
> I have a conceptual problem (sort of). Maybe there's a
> simple solution, 
> maybe not.
> 
> First, let me explain the test case that goes ok.
> 
> Let x be a (fixed) n-dimensional vector. I simulate a lot
> of linear models, 
> y = m x + c + error, then I do a lot of regressions. As
> expected, the 
> estimated values of m (let's call them m.bar) are
> distributed according to a 
> Student's t distribution.
> 
> More precisely (test case, it takes a few minutes to run):
> #
> # start with fixed numbers
> #
>   m <- sample(c(-1, -0.1, 0.1, 1), 1)
>   c <- sample(c(-1, -0.1, 0, 0.1, 1), 1)
>   sigma <- sample(c(0.1, 0.2, 0.5, 1), 1)
>   n <- sample(c(4,5,10,1000), 1)
>   x <- 1:n # it's possible to use other x
>   NN <- sample(c(3000, 4000, 5000), 1)
>   m.bar <- m.std.error <- 0  # these vectors will
> hold the simulation output
> 
> #
> # Now let's repeat NN times:
> # simulate y
> # regress y ~ x
> # store m.bar and its error
> #
>   for (i in 1:NN) {
>     y <- m * x + c + sigma * rnorm(n)
>     reg <- lm(y ~ x)
>     m.bar[i] <- summary(reg)$coefficients['x',
> 'Estimate']
>     m.std.error[i] <-
> summary(reg)$coefficients['x', 'Std. Error']
>   }
> #
> # Finally, let's analyse it
> # The distribution of (m.bar - m) / m.std.error is
> # a Student's t with n - 2 degrees of freedom.
> # Is it? Let's test!
> #
>   print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))
> 
> Now, this goes ok, with ks.test returning a number
> uniformly distributed in 
> the 0-1 interval.
> 
> Next, I did a slightly different model. This model is a
> reversion model, 
> where the simulated random variable lp goes along the
> equation:
> lp[t + 1] <- (1 + m) * lp[t] + c + error
> 
> I tried to use the same method as above, defining 
> x = lp[1:n], y = lp[2:(n+1}] - lp[1:n], with equation
> y <- m * x + c + error
> 
> And now it breaks. Test case (it takes some minutes to
> run):
> 
> #
> # start with fixed numbers
> #
>   m <- runif(1, -1, 0)  # m must be something in the
> (-1,0) interval
>   c <- sample(c(-1, -0.1, 0, 0.1, 1), 1)
>   sigma <- sample(c(0.1, 0.2, 0.5, 1), 1)
>   n <- sample(c(4,5,10,1000), 1)
>   NN <- sample(c(3000, 4000, 5000), 1)
>   m.bar <- m.std.error <- 0  # these vectors will
> hold the simulation output
> 
> #
> # Now let's repeat NN times:
> # simulate y
> # regress y ~ x
> # store m.bar and its error
> #
>   for (i in 1:NN) {
>     lp <- 0
>     lp[1] <- 0
>     for (j in 1:n) {
>       lp[j+1] = (1 + m) * lp[j] + c + sigma * rnorm(1)
>     }
>     x <- lp[1:n]
>     y <- lp[-1] - x
>     reg <- lm(y ~ x)
>     m.bar[i] <- summary(reg)$coefficients['x',
> 'Estimate']
>     m.std.error[i] <-
> summary(reg)$coefficients['x', 'Std. Error']
>   }
> #
> # Finally, let's analyse it
> # The distribution of (m.bar - m) / m.std.error should be
> # a Student's t with n - 2 degrees of freedom.
> # Is it? Let's test!
> #
>   print(ks.test((m.bar - m) / m.std.error, rt(NN, n - 2)))
> 
> ... and now it's not.
> 
> What's wrong with this? I suspect that the model y ~ x
> does only give 
> meaningful values when x is deterministic; in this case x
> is stochastic. Is 
> there any correct way to estimate this model giving
> meaningful outputs?
> 
> Alberto Monteiro
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained,
> reproducible code.