[R] bug(?) in str() with strict.width = "cut" when appliedtodataframe with numeric component AND factor or character component withlongerlevels/strings
Gerrit Eichner
Gerrit.Eichner at math.uni-giessen.de
Wed Oct 16 10:59:22 CEST 2013
Dear Duncan,
unfortunately, I have to correct myself in that I _can_ reproduce the
problem after changing the global width-option to 70, say: Using the data
frame X from before with the 'factory-fresh' setting for width and
executing
> str( X, strict.width = "cut")
'data.frame': 11 obs. of 2 variables:
$ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
$ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..
produces the correct output. But
> oo <- options( width = 70)
> str( X, strict.width = "cut")
'data.frame': 11 obs. of 2 variables:
$ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
$ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
is obviously the wrong output I reported previously. Restoring the old
options "solves" the problem:
> options( oo)
> str( X, strict.width = "cut")
'data.frame': 11 obs. of 2 variables:
$ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+05 ...
$ B: Factor w/ 1 level "zjtvorkmoydsepnxkabmeondrjaanutjmfxlgzmrbjp": 1 1 1 1..
Is that reproducible for you?
Regards -- Gerrit
PS: "New" session info:
> sessionInfo()
R version 3.0.2 (2013-09-25)
Platform: x86_64-w64-mingw32/x64 (64-bit)
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] fortunes_1.5-0
loaded via a namespace (and not attached):
[1] tools_3.0.2
On Wed, 16 Oct 2013, Gerrit Eichner wrote:
> Thanks, Duncan,
>
> for the good (indirect) hint: after a restart of R the problem is --
> fortunately :-) -- not reproducible anymore for me either. The R session had
> been running for a longer time and I recall doing some (system-related)
> things outside of R that may have interfered with it; I just forgot to take
> that possibility into consideration. :(
>
> Regards -- Gerrit
>
> On Tue, 15 Oct 2013, Duncan Murdoch wrote:
>
>> On 15/10/2013 7:53 AM, Gerrit Eichner wrote:
>>> Dear list subscribers,
>>>
>>> here is a small artificial example to demonstrate the problem that I
>>> encountered when looking at the structure of a (larger) data frame that
>>> comprised (among other components)
>>>
>>> a numeric component of elements of the order of > 10000, and
>>>
>>> a factor or character component with longer levels/strings:
>>>
>>>
>>> k <- 43 # length of levels or character strings
>>> n <- 11 # number of rows of data frame
>>> M <- 10000 # order of magnitude of numerical values
>>>
>>> set.seed( 47) # to reproduce the following artificial character string
>>> longer.char.string <- paste( sample( letters, k, replace = TRUE),
>>> collapse = "")
>>>
>>> X <- data.frame( A = 1:n * M,
>>> B = rep( longer.char.string, n))
>>>
>>>
>>> The following call to str() gives apparently a wrong result
>>>
>>> str( X, strict.width = "cut")
>>>
>>> 'data.frame': 11 obs. of 2 variables:
>>> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>> $ A: num 1e+04 2e+04 3e+04 4e+04 5e+04 6e+04 7e+04 8e+04 9e+04 1e+..
>>>
>>>
>>> whereas the correct result appears for str( X) or if you decrease k to 42
>>> (isn't that "the answer"? ;-) ) or n to 10 or M to 1000 (or smaller,
>>> respectively).
>>>
>>>
>>> I tried to dig into the entrails of str.default(), where the cause may
>>> lie, but got lost pretty soon. So, I am hoping that someone may already
>>> have a work-around or patch (or dares to dig further)? Thank you for any
>>> feedback!
>>
>> I can't reproduce this. I don't have a 64 bit copy of 3.0.2 handy, but I
>> don't see it in 64 bit 3.0.1, or 64 bit 3.0.2-patched, or various 32 bit
>> versions.
>>
>> Is it reproducible for you? It looks to me as though (if it isn't just
>> something weird on your system, e.g. an old copy of str() in your
>> workspace), it might be a memory protection problem: something needed to
>> be duplicated but wasn't. But unless I can see it happen, I can't start to
>> fix it.
>>
>> Duncan Murdoch
>>
>>>
>>> Best regards -- Gerrit
>>>
>>> PS:
>>>
>>> > sessionInfo()
>>>
>>> R version 3.0.2 (2013-09-25)
>>> Platform: x86_64-w64-mingw32/x64 (64-bit)
>>>
>>> locale:
>>> [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
>>> [3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
>>> [5] LC_TIME=German_Germany.1252
>>>
>>> attached base packages:
>>> [1] splines stats graphics grDevices utils datasets
>>> [7] methods base
>>>
>>> other attached packages:
>>> [1] nparcomp_2.0 multcomp_1.2-21 mvtnorm_0.9-9996
>>> [4] car_2.0-19 Hmisc_3.12-2 Formula_1.1-1
>>> [7] survival_2.37-4 fortunes_1.5-0
>>>
>>> loaded via a namespace (and not attached):
>>> [1] cluster_1.14.4 grid_3.0.2 lattice_0.20-23 MASS_7.3-29
>>> [5] nnet_7.3-7 rpart_4.1-3 stats4_3.0.2 tools_3.0.2
>>>
>>> ---------------------------------------------------------------------
>>> Dr. Gerrit Eichner Mathematical Institute, Room 212
>>> gerrit.eichner at math.uni-giessen.de Justus-Liebig-University Giessen
>>> Tel: +49-(0)641-99-32104 Arndtstr. 2, 35392 Giessen, Germany
>>> Fax: +49-(0)641-99-32109 http://www.uni-giessen.de/cms/eichner
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list