[R] input string ... cannot be translated to UTF-8, is it valid in 'ANSI_X3.4-1968'?

Duncan Murdoch murdoch@dunc@n @end|ng |rom gm@||@com
Fri Apr 23 05:07:30 CEST 2021

On 22/04/2021 9:25 p.m., Spencer Graves wrote:
> Hello:
> 	  What if anything should I do regarding notes from either "load" or
> "attach" that, "input string ... cannot be translated to UTF-8, is it
> valid in 'ANSI_X3.4-1968'?"?

First, ANSI_X3.4-1968  is an official name for for a version of Ascii. 
It appears in the file near the start, where I believe it records the 
native encoding in place when the file was written, so readers using a 
different encoding can translate.

Your actual file appears to have been encoded in UTF-8, but not marked 
as such.  You're lucky you read it on macOS, where UTF-8 is the native 
encoding, since the reader probably recognized the bytes weren't ascii 
bytes (and warned you about that), then just left them alone.  If you 
read that file on Windows you'd likely get junk for those entries.

For your interest, here's a dump of the start of your file, after 
gunzipping it:

00000000  52 44 58 33 0a 58 0a 00  00 00 03 00 03 06 00 00 
00000010  03 05 00 00 00 00 0e 41  4e 53 49 5f 58 33 2e 34 
00000020  2d 31 39 36 38 00 00 04  02 00 00 00 01 00 04 00 
00000030  09 00 00 00 01 78 00 00  03 13 00 00 00 10 00 00 
00000040  02 0e 00 00 02 6e 40 90  0c 00 00 00 00 00 40 90 
|.....n using .......@.|
00000050  44 00 00 00 00 00 40 10  00 00 00 00 00 00 40 7c 
|D..... using .......@||

Duncan Murdoch

> 	  I'm running R 4.0.5 under macOS 11.2.3;  see "sessionInfo()" and
> detailed instructions below on the precise file I dowloaded from the web
> and tried to read.
> 	  I may be able to get what I want just ignoring this.  However, I'd
> like to know how to fix this.
> 	  Thanks,
> 	  Spencer Graves
> sessionInfo()
> R version 4.0.5 (2021-03-31)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS Big Sur 10.16
> Matrix products: default
> /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
> loaded via a namespace (and not attached):
>    [1] compiler_4.0.5    htmltools_0.5.1.1 tools_4.0.5       yaml_2.2.1
>    [5] tinytex_0.31      rmarkdown_2.7     knitr_1.31
> digest_0.6.27
>    [9] xfun_0.22         rlang_0.4.10      evaluate_0.14
>   > search()
>    [1] ".GlobalEnv"                "file:NAVCO 1.3 List.RData"
>    [3] "file:NAVCO 1.3 List.RData" "tools:rstudio"
>    [5] "package:stats"             "package:graphics"
>    [7] "package:grDevices"         "package:utils"
>    [9] "package:datasets"          "package:methods"
> [11] "Autoloads"                 "package:base"
> *** To get the file I used for this, I went to
> "https://www.ericachenoweth.com/research".  From there I clicked
> "Version 1.3".  This took me to
> https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/ON9XND
> I then clicked the "Download" icon to the right of "NAVCO 1.3 List.tab".
>    This gave me 5 "Download Options", one of which was "RData Format";  I
> selected that.  This downloaded "NAVCO 1.3 List.RData", which I moved to
> getwd().  Then I did 'load("NAVCO 1.3 List.RData")' and 'attach("NAVCO
> 1.3 List.RData")'.  Both of those gave me 8 repetitions of a message
> like "input string ... cannot be translated to UTF-8, is it valid in
> 'ANSI_X3.4-1968'?" with different values substituted for "...".
> ______________________________________________
> R-help using r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

More information about the R-help mailing list