[R] [EXTERNAL] Re: bug in Windows implementation of nlme::groupedData
Key, Melissa
mkey @end|ng |rom |n|o@c|tex@com
Fri Jan 7 23:29:29 CET 2022
John,
Thanks for your response. I agree that the definition of the data frame is poor (in my defense it came directly from the demo code, but I should have checked it more thoroughly). The good news is that your comments caused me to take a closer look at where X was defined, and I found the reason I wasn't getting the same results on my Mac and PC - that error was between keyboard and chair.
There is still something funny going on though (at least relative to my previous experience with how R searches environments:
If X is defined in the global environment, groupedData can find it there and use it. (this is what I'm used to)
If X is defined within a function, groupedData cannot find it, even if groupedData is called within the same function. (this seems strange to me - usually parent.frame() captures information within the function environment, or so I thought)
My solution at the bottom still works - and unlike groupedData, nlme allows a list as input to the data argument (or at least, doesn't check to make sure it's a data frame), so I have a working (albeit hacky) solution that actually makes more sense to me than using groupedData, but it still seems strange that the function cannot find X in its search path.
Thanks again!
Melissa
-----Original Message-----
From: John Fox <jfox using mcmaster.ca>
Sent: Friday, January 7, 2022 4:35 PM
To: Key, Melissa <mkey using infoscitex.com>
Cc: r-help using r-project.org
Subject: [EXTERNAL] Re: [R] bug in Windows implementation of nlme::groupedData
Dear Melissa,
It seems strange to me that your code would work on any platform (it doesn't on my Mac) because the data frame you create shouldn't contain a matrix named "X" but rather columns including those originating from X.
To illustrate:
> X <- matrix(1:12, 4, 3)
> colnames(X) <- c("a", "b", "c")
> X
a b c
[1,] 1 5 9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12
> y <- 1:4
> (D <- data.frame(y, X))
y a b c
1 1 1 5 9
2 2 2 6 10
3 3 3 7 11
4 4 4 8 12
> str(D)
'data.frame': 4 obs. of 4 variables:
$ y: int 1 2 3 4
$ a: int 1 2 3 4
$ b: int 5 6 7 8
$ c: int 9 10 11 12
My session info:
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.1
Matrix products: default
LAPACK:
/Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] nlme_3.1-153 HRW_1.0-5
loaded via a namespace (and not attached):
[1] compiler_4.1.2 tools_4.1.2 KernSmooth_2.23-20
splines_4.1.2
[5] grid_4.1.2 lattice_0.20-45
I hope this helps,
John
--
John Fox, Professor Emeritus
McMaster University
Hamilton, Ontario, Canada
web: https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsocialsciences.mcmaster.ca%2Fjfox%2F&data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=nTXsf6vbxVIstMup5AUlmABv9CdQa3hw44Fdb2lej7A%3D&reserved=0
On 2022-01-07 11:23 a.m., Key, Melissa wrote:
> I am trying to replicate a semi-parametric analysis described in Harezlak, Jaroslaw, David Ruppert, and Matt P. Wand. Semiparametric regression with R. New York, NY: Springer, 2018. (https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Flink.springer.com%2Fbook%2F10.1007%252F978-1-4939-8853-2&data=04%7C01%7Cmkey%40infoscitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ESCv%2Bp%2FPo7pyKm055Y4nYAK8Cj1RTkorP9ZvahZTPmE%3D&reserved=0).
>
> I can successfully run the analysis, but now I'm trying to move it into my workflow, which requires that the analysis be conducted within a function (using targets package), and the `groupedData` function now fails with an error that it cannot find the `X` matrix (see reprex below). I've tried the reprex on both my personal Mac (where it works??) and on windows machines (where it does not) - so the problem is likely specific to Windows computers (yes, this seems weird to me too).
> All packages have been updated, and I'm running the latest version of R on all machines.
>
> Reprex:
>
> library(HRW) # contains example data and ZOSull function
> library(nlme)
>
> data(growthIndiana)
>
>
> analyze_this <- function(df) {
>
> mean.x <- mean(df$age)
> mean.y <- mean(df$height)
> sd.x <- sd(df$age)
> sd.y <- sd(df$height)
>
> df$x <- (df$age - mean.x) / sd.x
> df$y <- (df$height - mean.y) / sd.y
>
> X <- model.matrix(~ x * male * black, data = df)
> dummyID <- rep(1, length(nrow(X)))
>
> grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
> data = data.frame(y = df$y, X, dummyID))
>
> }
>
>
> # doesn't work on Windows machine, does work on the Mac
> analyze_this(growthIndiana)
> #> Error in eval(aux[[2]], object): object 'X' not found
>
> # does work
>
> df <- growthIndiana
>
> mean.x <- mean(df$age)
> mean.y <- mean(df$height)
> sd.x <- sd(df$age)
> sd.y <- sd(df$height)
>
> df$x <- (df$age - mean.x) / sd.x
> df$y <- (df$height - mean.y) / sd.y
>
> X <- model.matrix(~ x * male * black, data = df) dummyID <- rep(1,
> length(nrow(X)))
>
> grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)), data
> = data.frame(y = df$y, X, dummyID))
>
>
> # attempted work-around.
>
> analyze_this2 <- function(df) {
> num.global.knots = 20
> num.subject.knots = 10
>
> mean.x <- mean(df$age)
> mean.y <- mean(df$height)
> sd.x <- sd(df$age)
> sd.y <- sd(df$height)
>
> df$x <- (df$age - mean.x) / sd.x
> df$y <- (df$height - mean.y) / sd.y
>
> X <- model.matrix(~ x * male * black, data = df)
> dummyID <- rep(1, length(nrow(X)))
>
> # grouped_data <- groupedData(y ~ X[,-1]|rep(1, length = nrow(X)),
> data = data.frame(y = df$y, X, dummyID))
>
> global.knots = quantile(unique(df$x), seq(0, 1, length = num.global.knots + 2)[-c(1, num.global.knots + 2)])
> subject.knots = quantile(unique(df$x), seq(0, 1, length =
> num.subject.knots + 2)[-c(1, num.subject.knots + 2)])
>
> Z.global <- ZOSull(df$x, range.x = range(df$x), global.knots)
> Z.group <- df$black * Z.global
> Z.subject <- ZOSull(df$x, range.x = range(df$x), subject.knots)
>
> Zblock <- list(
> dummyID = pdIdent(~ 0 + Z.global),
> dummyID = pdIdent(~ 0 + Z.group),
> idnum = pdSymm(~ x),
> idnum = pdIdent(~ 0 + Z.subject)
> )
>
> df$dummyID <- dummyID
> tmp_data <- c(
> df,
> X = list(X),
> Z.global = list(Z.global),
> Z.group = list(Z.global),
> Z.subject = list(Z.subject)
> )
>
> fit <- lme(y ~ 0 + X,
> data = tmp_data,
> random = Zblock
> )
>
> }
>
> # this works (warning - lme takes awhile to fit)
> analyze_this2(growthIndiana)
>
> sessionInfo()
> #> R version 4.1.2 (2021-11-01)
> #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows
> 10 x64 (build 22000) #> #> Matrix products: default #> #> locale:
> #> [1] LC_COLLATE=English_United States.1252 #> [2]
> LC_CTYPE=English_United States.1252 #> [3] LC_MONETARY=English_United
> States.1252 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United
> States.1252 #> #> attached base packages:
> #> [1] stats graphics grDevices utils datasets methods base
> #>
> #> other attached packages:
> #> [1] nlme_3.1-153 HRW_1.0-5
> #>
> #> loaded via a namespace (and not attached):
> #> [1] lattice_0.20-45 digest_0.6.29 withr_2.4.3 grid_4.1.2
> #> [5] magrittr_2.0.1 reprex_2.0.1 evaluate_0.14 KernSmooth_2.23-20
> #> [9] highr_0.9 stringi_1.7.6 rlang_0.4.12 cli_3.1.0
> #> [13] rstudioapi_0.13 fs_1.5.2 rmarkdown_2.11 splines_4.1.2
> #> [17] tools_4.1.2 stringr_1.4.0 glue_1.6.0 xfun_0.29
> #> [21] yaml_2.2.1 fastmap_1.1.0 compiler_4.1.2 htmltools_0.5.2
> #> [25] knitr_1.37
>
> Created on 2022-01-07 by the [reprex
> package](https://usg02.safelinks.protection.office365.us/?url=https%3A
> %2F%2Freprex.tidyverse.org%2F&data=04%7C01%7Cmkey%40infoscitex.com
> %7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e985038b58
> %7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw
> MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=hH
> 5ncnSVZGQ4KZsvLpvjgM%2Fgf9Zu5QnoeSntTrCZWjk%3D&reserved=0)
> (v2.0.1)
>
> Here's the session Info for the Mac:
>
> sessionInfo()
> R version 4.1.2 (2021-11-01)
> Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS
> Monterey 12.1
>
> Matrix products: default
> LAPACK:
> /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRl
> apack.dylib
>
> locale:[1]
> en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] nlme_3.1-153 targets_0.8.1 HRW_1.0-5
>
> loaded via a namespace (and not attached):
> [1] compiler_4.1.2 pillar_1.6.4 tools_4.1.2 digest_0.6.28 lattice_0.20-45 jsonlite_1.7.2 evaluate_0.14
> [8] lifecycle_1.0.1 tibble_3.1.6 pkgconfig_2.0.3 rlang_0.4.12 igraph_1.2.9 cli_3.1.0 yaml_2.2.1
> [15] xfun_0.28 fastmap_1.1.0 withr_2.4.2 knitr_1.36 htmlwidgets_1.5.4 vctrs_0.3.8 grid_4.1.2
> [22] tidyselect_1.1.1 glue_1.5.0 data.table_1.14.2 R6_2.5.1 processx_3.5.2 fansi_0.5.0 rmarkdown_2.11
> [29] callr_3.7.0 purrr_0.3.4 magrittr_2.0.1 scales_1.1.1 codetools_0.2-18 ps_1.6.0 htmltools_0.5.2
> [36] ellipsis_0.3.2 splines_4.1.2 colorspace_2.0-2 KernSmooth_2.23-20 utf8_1.2.2 visNetwork_2.1.0 munsell_0.5.0
> [43] crayon_1.4.2
>
>
> Melissa Key, Ph.D.
> Statistician
> Non-Invasive Brain Stimulation Team
> Infoscitex
>
> Cell: 937-550-4981
> Email: mkey using infoscitex.com<mailto:mkey using infoscitex.com>
> Base email:
> melissa.key.ctr using us.af.mil<mailto:melissa.key.ctr using us.af.mil>
>
>
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://usg02.safelinks.protection.office365.us/?url=https%3A%2F%2Fsta
> t.ethz.ch%2Fmailman%2Flistinfo%2Fr-help&data=04%7C01%7Cmkey%40info
> scitex.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690
> e985038b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIj
> oiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&am
> p;sdata=4c%2Bnu06ZfIKpwowJCTz2mIl7kAEQR5vbZSw8pTuqumk%3D&reserved=
> 0 PLEASE do read the posting guide
> https://usg02.safelinks.protection.office365.us/?url=http%3A%2F%2Fwww.
> r-project.org%2Fposting-guide.html&data=04%7C01%7Cmkey%40infoscite
> x.com%7C57e0067b32a24526600c08d9d2258bd8%7C3430c5de5e6f4f9d9d9690e9850
> 38b58%7C0%7C0%7C637771881005380638%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4
> wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sda
> ta=iT8fu9JX5i7gUfo0T2gQMVdHEm%2FSThakp2yaOJ2pOBQ%3D&reserved=0
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list