[R] dynamic variable creation in lists and data frames
Marc Schwartz
marc_schwartz at comcast.net
Tue Dec 5 21:21:48 CET 2006
On Tue, 2006-12-05 at 14:41 -0500, Daniel Lee Rabosky wrote:
> Hi
>
> I have a question about the creation of variables within lists in R. I am
> running simulations and am interested in two parameters, ESM and ESMM (the
> similarity of these names is important for my question). I do simulations
> to generate ESMM, then plug these values into a second simulation function
> to get ESM:
>
> x <- list()
>
> for (i in 1:nsimulations)
> {
> x$ESMM[i] <- do_simulation1()
> x$ESM[i] <- do_simulation2(x$ESMM[i])
> }
>
> and I return everything as a dataframe, x <- as.data.frame(x)
>
> When I do this, I find that x$ESMM is overwritten by x$ESM for the first
> simulation. However, x$ESM is nonetheless correctly generated using
> x$ESMM.
>
> Thus, x$ESM[1] = x$ESMM[1], but for the other n-thousand simulations,
> ESMM is not overwritten; the error only occurs on the first instance of
> ESM.
>
> I think I know why this is occurring: I am creating a new variable in a
> list and assigning it a value, but when R can’t find the variable, it
> overwrites the next most similar variable (ESMM). But it still proceeds
> to create the new variable ESM, having overwritten x$ESMM[1]. And it
> doesn’t happen for subsequent simulations, because both variables then
> exist in the list.
>
> My questions are:
> 1) how different do variable names have to be to avoid this problem? What
> exactly is R using to decide that ESMM is the same as ESM?
>
> or
>
> 2) is there something fundamentally flawed with the manner in which I
> dynamically create variables in lists, without initializing them in some
> fashion? This approach worked fine until I noticed this issue with
> variables having similar names.
>
> Thanks very much in advance for your help.
>
> Dan Rabosky
This has to do with partial matching to index data frame columns and
list elements. It is the default behavior in R and if you search the
archives using:
RSiteSearch("partial matching")
you will note prior discussions on this.
A simple example:
> x <- list()
> x
list()
> x$ESMM[1] <- 1
> x
$ESMM
[1] 1
> x$ESM[1] <- 2
> x
$ESMM
[1] 2
$ESM
[1] 2
Both values are changed, since x$ESM does not yet exist and the
assignment partially matches x$ESMM. Then x$ESM is created.
I think that in this particular situation, you might want to try:
# Create a simple function that returns pairs of random samples from
# 'letters', which is a:z
Sim <- function()
{
list(ESMM = letters[sample(26, 1)],
ESM = letters[sample(26, 1)])
}
# Run it once
> Sim()
$ESMM
[1] "l"
$ESM
[1] "z"
Now use replicate() to do this 10 times. Note the default behavior is to
simplify the returned values into a matrix.
> x <- replicate(10, Sim())
> x
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
ESMM "x" "q" "c" "f" "e" "f" "y" "d" "z" "h"
ESM "u" "c" "j" "v" "u" "j" "o" "p" "g" "g"
So, in your case create a function Sim() like this:
Sim <- function()
{
ESMM <- do_simulation1()
ESM <- do_simulation2(ESMM)
list(ESMM = ESMM, ESM = ESM)
}
and then use replicate() as above. See ?replicate for more information.
HTH,
Marc Schwartz
More information about the R-help
mailing list