[R] tidyverse: read_csv() misses column
Bill Dunlap
w||||@mwdun|@p @end|ng |rom gm@||@com
Mon Nov 1 18:34:59 CET 2021
Use the col_type argument to specify your column types. [Why would you
expect '2009' to be read as a string instead of a number?]. It looks like
an
initial zero causes an otherwise numeric looking entry to be considered
a string (handy for zip codes in the northeastern US).
help(read_csv) says the column type guessing is "not robust" and its
algorithm
doesn't seem to be documented in the help file:
col_types
One of NULL, a cols() specification, or a string. See vignette("readr") for
more details.
If NULL, all column types will be imputed from guess_max rows on the input
interspersed throughout the file. This is convenient (and fast), but not
robust. If the imputation fails, you'll need to increase the guess_max or
supply the correct types yourself.
...
-Bill
On Mon, Nov 1, 2021 at 10:16 AM Rich Shepard <rshepard using appl-ecosys.com>
wrote:
>
> On Mon, 1 Nov 2021, Kevin Thorpe wrote:
>
> > I do not have a specific answer to your particular problem. All I can
say
> > is when a CSV import doesn’t work, it can mean there is something in the
> > CSV file that is unexpected. When read_csv() fails, I will try
read.csv()
> > to compare the results.
>
> Kevin,
>
> Interesting that there's no error:
> cor_disc <- read.csv("../data/cor-disc.csv", header = TRUE)
> ...
> 12496 14171600 2010 3 15 16 45 PDT 1060
> 12497 14171600 2010 3 15 17 0 PDT 1060
> 12498 14171600 2010 3 15 17 15 PDT 1050
> 12499 14171600 2010 3 15 17 45 PDT 1050
> [ reached 'max' / getOption("max.print") -- omitted 402856 rows ]
> > head(cor_disc)
> site_nbr year mon day hr min tz disc
> 1 14171600 2009 10 23 0 0 PDT 8750
> 2 14171600 2009 10 23 0 15 PDT 8750
> 3 14171600 2009 10 23 0 30 PDT 8750
> 4 14171600 2009 10 23 0 45 PDT 8750
> 5 14171600 2009 10 23 1 0 PDT 8750
> 6 14171600 2009 10 23 1 15 PDT 8750
> > str(cor_disc)
> 'data.frame': 415355 obs. of 8 variables:
> $ site_nbr: chr "14171600" "14171600" "14171600" "14171600" ...
> $ year : int 2009 2009 2009 2009 2009 2009 2009 2009 2009 2009 ...
> $ mon : int 10 10 10 10 10 10 10 10 10 10 ...
> $ day : int 23 23 23 23 23 23 23 23 23 23 ...
> $ hr : int 0 0 0 0 1 1 1 1 2 2 ...
> $ min : int 0 15 30 45 0 15 30 45 0 15 ...
> $ tz : chr "PDT" "PDT" "PDT" "PDT" ...
> $ disc : int 8750 8750 8750 8750 8750 8750 8750 8730 8730 8730 ...
>
> So, where might I look to see why tidyverse's read_csv() doesn't produce
the
> same results?
>
> Regards,
>
> Rich
>
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]
More information about the R-help
mailing list