[R] Increasing number of observations worsen the regression model
Raffa
r@||@m@|den @end|ng |rom gm@||@com
Sat May 25 14:38:07 CEST 2019
I have the following code:
```
rm(list=ls())
N = 30000
xvar <- runif(N, -10, 10)
e <- rnorm(N, mean=0, sd=1)
yvar <- 1 + 2*xvar + e
plot(xvar,yvar)
lmMod <- lm(yvar~xvar)
print(summary(lmMod))
domain <- seq(min(xvar), max(xvar)) # define a vector of x values to
feed into model
lines(domain, predict(lmMod, newdata = data.frame(xvar=domain))) #
add regression line, using `predict` to generate y-values
```
I expected the coefficients to be something similar to [1,2]. Instead R
keeps throwing at me random numbers that are not statistically
significant and don't fit the model, and I have 20k observations. For
example
```
Call:
lm(formula = yvar ~ xvar)
Residuals:
Min 1Q Median 3Q Max
-21.384 -8.908 1.016 10.972 23.663
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0007145 0.0670316 0.011 0.991
xvar 0.0168271 0.0116420 1.445 0.148
Residual standard error: 11.61 on 29998 degrees of freedom
Multiple R-squared: 7.038e-05, Adjusted R-squared: 3.705e-05
F-statistic: 2.112 on 1 and 29998 DF, p-value: 0.1462
```
The strange thing is that the code works perfectly for N=200 or N=2000.
It's only for larger N that this thing happen U(for example, N=20000). I
have tried to ask for example in CrossValidated
<https://stats.stackexchange.com/questions/410050/increasing-number-of-observations-worsen-the-regression-model>
but the code works for them. Any help?
I am runnign R 3.6.0 on Kubuntu 19.04
Best regards
Raffaele
[[alternative HTML version deleted]]
More information about the R-help
mailing list