[R] Read_fwf in package readr, double vs. numeric
Sarah Goslee
@@r@h@go@|ee @end|ng |rom gm@||@com
Wed Apr 24 17:43:54 CEST 2019
And just for thoroughness, I meant that it works in readr 1.3.1, as my
sessionInfo (but not what I typed myself) said. Sorry for the typo,
but I'm glad it solved your problem nonetheless.
Sarah
On Wed, Apr 24, 2019 at 11:38 AM Doran, Harold <HDoran using air.org> wrote:
>
> Thank you, Sarah. Seems that updating to a newer version does indeed solve that problem. For completeness, below is the version in which it seems to work properly and below is the version in which I observe the problem I described.
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] readr_1.3.1
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.3 assertthat_0.2.1 R6_2.4.0 cli_1.1.0 hms_0.4.2
> [6] tools_3.5.3 pillar_1.3.1 tibble_2.1.1 Rcpp_1.0.1 crayon_1.3.4
> [11] utf8_1.1.4 fansi_0.4.0 pkgconfig_2.0.2 rlang_0.3.4
>
> > sessionInfo()
> R version 3.4.2 (2017-09-28)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 7 x64 (build 7601) Service Pack 1
>
> Matrix products: default
>
> locale:
> [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
> [5] LC_TIME=English_United States.1252
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] readr_1.1.1
>
> loaded via a namespace (and not attached):
> [1] compiler_3.4.2 assertthat_0.2.0 R6_2.2.2 cli_1.0.0 hms_0.3 tools_3.4.2
> [7] pillar_1.3.0 tibble_1.4.2 Rcpp_1.0.0 crayon_1.3.4 utf8_1.1.4 fansi_0.2.3
> [13] rlang_0.3.0.1
>
> -----Original Message-----
> From: Sarah Goslee <sarah.goslee using gmail.com>
> Sent: Wednesday, April 24, 2019 11:12 AM
> To: Doran, Harold <HDoran using air.org>
> Cc: r-help using r-project.org
> Subject: Re: [R] Read_fwf in package readr, double vs. numeric
>
> Hi,
>
> I can't reproduce your problem: with readr 1.1.1 on linux, it works as expected. Letting read_fwf guess the types also works fine. (See
> below.)
>
> If you aren't running the current version of readr, update and retry.
> If you are, then we probably need more info, at least sessionInfo().
>
> Sarah
>
>
>
> library(readr)
> myFile <- "foo.txt"
> pos <- fwf_positions(c(1,2,7), c(1,6,10))
>
>
> type <- c('N','D','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
> X1 X2 X3
> <dbl> <dbl> <dbl>
> 1 1 1.00e-20 1043
> 2 1 7.12e+ 4 1043
> 3 1 9.12e+ 4 1055
>
>
> type <- c('N','N','N')
> types <- paste0(type, collapse = '')
> types <- chartr('NCD', 'ncd', types)
> read_fwf(file = myFile, col_positions = pos, col_types = types)
>
> # A tibble: 3 x 3
> X1 X2 X3
> <dbl> <dbl> <dbl>
> 1 1 1.00e-20 1043
> 2 1 7.12e+ 4 1043
> 3 1 9.12e+ 4 1055
>
>
>
> > read_fwf(file = myFile, col_positions = pos, col_types = NULL)
> Parsed with column specification:
> cols(
> X1 = col_double(),
> X2 = col_double(),
> X3 = col_double()
> )
> # A tibble: 3 x 3
> X1 X2 X3
> <dbl> <dbl> <dbl>
> 1 1 1.00e-20 1043
> 2 1 7.12e+ 4 1043
> 3 1 9.12e+ 4 1055
>
>
>
>
> > sessionInfo()
> R version 3.5.3 (2019-03-11)
> Platform: x86_64-redhat-linux-gnu (64-bit) Running under: Fedora 28 (Workstation Edition)
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib64/R/lib/libRblas.so
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] readr_1.3.1 colorout_1.2-0
>
> loaded via a namespace (and not attached):
> [1] compiler_3.5.3 assertthat_0.2.0 R6_2.4.0 cli_1.0.1
> [5] hms_0.4.2 tools_3.5.3 pillar_1.3.1 tibble_2.0.1
> [9] Rcpp_1.0.0 crayon_1.3.4 utf8_1.1.4 fansi_0.4.0
> [13] pkgconfig_2.0.2 rlang_0.3.1
>
>
> On Wed, Apr 24, 2019 at 10:56 AM Doran, Harold <HDoran using air.org> wrote:
> >
> > Suppose I have the following data sitting in a fwf file 'foo.txt'. The point of this email is to ask the group how to properly read in the value in this pseudo-data "1e-20" using the read_fwf function in the package readr.
> >
> > 11e-201043
> > 1712201043
> > 1912201055
> >
> > First, suppose I do it this way, where in this case "D" is used for double precision.
> >
> > library(readr)
> > pos <- fwf_positions(c(1,2,7), c(1,6,10)) type <- c('N','D','N') types
> > <- paste0(type, collapse = '') types <- chartr('NCD', 'ncd', types)
> >
> > read_fwf(file = myFile, col_positions = pos, col_types = types)
> >
> > # A tibble: 3 x 3
> > X1 X2 X3
> > <dbl> <dbl> <dbl>
> > 1 1 1.00e-20 1043
> > 2 1 7.12e+ 4 1043
> > 3 1 9.12e+ 4 1055
> >
> > This seemingly works well and properly captures the value. However, if
> > I instead were to indicate to the function that *all* of my columns
> > were numeric (just insert this one line in lieu of the other above)
> >
> > type <- c('N','N','N')
> >
> > # A tibble: 3 x 3
> > X1 X2 X3
> > <dbl> <dbl> <dbl>
> > 1 1 1 1043
> > 2 1 71220 1043
> > 3 1 91220 1055
> >
> > The read in is not correct. Here is the pragmatic issue. I have a legacy program that spits out the layout structure of the fwf file (start, end positions) and also indicates what the column types are. This layout file we receive always uses a column type of numeric (N) for any numeric types (including the column holding values such as 1e-20).
> >
> > This layout file will not change so I need to figure out how to solve the problem within my read in program. I suppose one option is that I could manually change any values of "N" to "D" in my R code. That seems to work. But not sure if that is the "right" way to solve this issue.
> >
> > Thanks
> > Harold
> >
> > ______________________________________________
> > R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
>
>
> --
> Sarah Goslee (she/her)
> http://www.numberwright.com
>
--
Sarah Goslee (she/her)
http://www.sarahgoslee.com
More information about the R-help
mailing list