[R] Problem with lm Giving Wrong Results

Labone, Thomas |@bone @end|ng |rom em@||@@c@edu
Fri Dec 3 16:28:53 CET 2021


Thanks for the feedback everyone. If you go to https://github.com/csantill/RPerformanceWBLAS/blob/master/RPerformanceBLAS.md you will find the Linux commands to change the default math library. When I switch the BLAS library from MKL to the system default (see sessionInfo below), everything works as expected. I installed version 2020.0-166-1 of "Intel-MKL" from the Linux Mint Software Manager. I may be coming to a hasty conclusion, but there appears to be something wrong with that package or how it interacts with other system software. Any suggestions on who I should notify about the problem (e.g., Intel, Mint, Ubuntu)?

R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2



Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA



________________________________
From: Labone, Thomas <labone using email.sc.edu>
Sent: Thursday, December 2, 2021 11:53 AM
To: Bill Dunlap <williamwdunlap using gmail.com>
Cc: r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] Problem with lm Giving Wrong Results

> summary(fit)

Call:
lm(formula = log(k) ~ Z)

Residuals:
    Min      1Q  Median      3Q     Max
-21.241   1.327   1.776   2.245   4.418

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.03465    0.01916  -1.809   0.0705 .
Z           -0.24207    0.01916 -12.634   <2e-16 ***
---
Signif. codes:  0 �***� 0.001 �**� 0.01 �*� 0.05 �.� 0.1 � � 1

Residual standard error: 1.914 on 9998 degrees of freedom
Multiple R-squared:  0.01467, Adjusted R-squared:  0.01457
F-statistic: 148.8 on 1 and 9998 DF,  p-value: < 2.2e-16

> summary(k)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.2735  3.7658  5.9052  7.5113  9.4399 82.9531
> summary(Z)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
-3.8906 -0.6744  0.0000  0.0000  0.6744  3.8906
> summary(gm*gsd^Z)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 0.3767  0.8204  0.9659  0.9947  1.1372  2.4772
>


Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA


________________________________
From: Bill Dunlap <williamwdunlap using gmail.com>
Sent: Thursday, December 2, 2021 10:31 AM
To: Labone, Thomas <labone using email.sc.edu>
Cc: r-help using r-project.org <r-help using r-project.org>
Subject: Re: [R] Problem with lm Giving Wrong Results

On the 'bad' machines, what did you get for
   summary(fit)
   summary(k)
   summary(Z)
   summary(gm*gsd^Z)
?

-Bill

On Thu, Dec 2, 2021 at 6:18 AM Labone, Thomas <labone using email.sc.edu<mailto:labone using email.sc.edu>> wrote:
In the code below the first and second plots should look pretty much the same, the only difference being that the first has n=1000 points and the second n=10000 points. On two of my Linux machines (info below) the second plot is a horizontal line (incorrect answer from lm), but on my Windows 10 machine and a third Linux machine it works as expected. The interesting thing is that the code works as expected for n <= 4095 but fails for n>=4096 (which equals 2^12). Can anyone else reproduce this problem? Any ideas on how to fix it?

set.seed(132)

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# This works
n <- 1000# OK <= 4095
Z <- qnorm(ppoints(n))

k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))

quantile(k,probs=c(0.025,0.5,0.975))
summary(k)

fit <- lm(log(k) ~ Z)
summary(fit)

gm <- exp(coef(fit)[1])
gsd <- exp(coef(fit)[2])
gm
gsd

plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
lines(Z,gm*gsd^Z,col="red")

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#this does not
n <- 10000# fails >= 4096 = 2^12
Z <- qnorm(ppoints(n))

k <- sort(rlnorm(n,log(2131),log(1.61)) / rlnorm(n,log(355),log(1.61)))

quantile(k,probs=c(0.025,0.5,0.975))
summary(k)

fit <- lm(log(k) ~ Z)
summary(fit)

gm <- exp(coef(fit)[1])
gsd <- exp(coef(fit)[2])
gm
gsd

plot(Z,k,log="y",xlim=c(-4,4),ylim=c(0.1,100))
lines(Z,gm*gsd^Z,col="red")


#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> sessionInfo() #for two Linux machines having problem
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 20.2

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.2  Matrix_1.3-4    tools_4.1.2     expm_0.999-6    grid_4.1.2      lattice_0.20-45

#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> sessionInfo() # for a third Linux machine not having the problem
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Linux Mint 19.3

Matrix products: default
BLAS/LAPACK: /opt/intel/compilers_and_libraries_2020.0.166/linux/mkl/lib/intel64_lin/libmkl_rt.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] compiler_4.1.1 tools_4.1.1



Thomas R. LaBone
PhD student
Department of Epidemiology and Biostatistics
Arnold School of Public Health
University of South Carolina
Columbia, South Carolina USA


        [[alternative HTML version deleted]]

______________________________________________
R-help using r-project.org<mailto:R-help using r-project.org> mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

	[[alternative HTML version deleted]]



More information about the R-help mailing list