[R] Read_fwf in package readr, double vs. numeric

Sarah Goslee @@r@h@go@|ee @end|ng |rom gm@||@com
Wed Apr 24 17:43:54 CEST 2019


And just for thoroughness, I meant that it works in readr 1.3.1, as my
sessionInfo (but not what I typed myself) said. Sorry for the typo,
but I'm glad it solved your problem nonetheless.

Sarah

On Wed, Apr 24, 2019 at 11:38 AM Doran, Harold <HDoran using air.org> wrote:
>
> Thank you, Sarah. Seems that updating to a newer version does indeed solve that problem. For completeness, below is the version in which it seems to work properly and below is the version in which I observe the problem I described.
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.3.1
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.5.3   assertthat_0.2.1 R6_2.4.0         cli_1.1.0        hms_0.4.2
>  [6] tools_3.5.3      pillar_1.3.1     tibble_2.1.1     Rcpp_1.0.1       crayon_1.3.4
> [11] utf8_1.1.4       fansi_0.4.0      pkgconfig_2.0.2  rlang_0.3.4
>
> > sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.1.1
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.4.2   assertthat_0.2.0 R6_2.2.2         cli_1.0.0        hms_0.3          tools_3.4.2
>  [7] pillar_1.3.0     tibble_1.4.2     Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.2.3
> [13] rlang_0.3.0.1
>
> -----Original Message-----
> From: Sarah Goslee <sarah.goslee using gmail.com>
> Sent: Wednesday, April 24, 2019 11:12 AM
> To: Doran, Harold <HDoran using air.org>
> Cc: r-help using r-project.org
> Subject: Re: [R] Read_fwf in package readr, double vs. numeric
>
> Hi,
>
> I can't reproduce your problem: with readr 1.1.1 on linux, it works as expected. Letting read_fwf guess the types also works fine. (See
> below.)
>
> If you aren't running the current version of readr, update and retry.
> If you are, then we probably need more info, at least sessionInfo().
>
> Sarah
>
>
>
> library(readr)
> myFile <- "foo.txt"
> pos <- fwf_positions(c(1,2,7), c(1,6,10))
>
>
> type <- c('N','D','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
> type <- c('N','N','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
>
> > read_fwf(file = myFile, col_positions = pos, col_types = NULL)
> Parsed with column specification:
> cols(
>   X1 = col_double(),
>   X2 = col_double(),
>   X3 = col_double()
> )
> # A tibble: 3 x 3
>      X1       X2    X3
>   <dbl>    <dbl> <dbl>
> 1     1 1.00e-20  1043
> 2     1 7.12e+ 4  1043
> 3     1 9.12e+ 4  1055
>
>
>
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition)
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] readr_1.3.1    colorout_1.2-0
>
> loaded via a namespace (and not attached):
>  [1] compiler_3.5.3   assertthat_0.2.0 R6_2.4.0         cli_1.0.1
>  [5] hms_0.4.2        tools_3.5.3      pillar_1.3.1     tibble_2.0.1
>  [9] Rcpp_1.0.0       crayon_1.3.4     utf8_1.1.4       fansi_0.4.0
> [13] pkgconfig_2.0.2  rlang_0.3.1
>
>
> On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <HDoran using air.org> wrote:
> >
> > Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
> >
> > 11e-201043
> > 1712201043
> > 1912201055
> >
> > First, suppose I do it this way, where in this case "D" is used for double precision.
> >
> > library(readr)
> > pos <- fwf_positions(c(1,2,7), c(1,6,10)) type <- c('N','D','N') types
> > <- paste0(type, collapse = '') types <- chartr('NCD', 'ncd', types)
> >
> > read_fwf(file = myFile, col_positions = pos, col_types = types)
> >
> > # A tibble: 3 x 3
> >      X1       X2    X3
> >   <dbl>    <dbl> <dbl>
> > 1     1 1.00e-20  1043
> > 2     1 7.12e+ 4  1043
> > 3     1 9.12e+ 4  1055
> >
> > This seemingly works well and properly captures the value. However, if
> > I instead were to indicate to the function that *all* of my columns
> > were numeric (just insert this one line in lieu of the other above)
> >
> > type <- c('N','N','N')
> >
> > # A tibble: 3 x 3
> >      X1    X2    X3
> >   <dbl> <dbl> <dbl>
> > 1     1     1  1043
> > 2     1 71220  1043
> > 3     1 91220  1055
> >
> > The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
> >
> > This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
> >
> > Thanks
> > Harold
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Sarah Goslee (she/her)
> http://www.numberwright.com
>


-- 
Sarah Goslee (she/her)
http://www.sarahgoslee.com



More information about the R-help mailing list