[R] embedded nuls in 2.10 versus 2.11
Brandon Whitcher
bwhitcher at gmail.com
Tue Mar 2 09:33:23 CET 2010
I have been reading binary files, and parsing the output, for some
time now. I have tried to develop a technique that is as robust as
possible to all the strange things that appear in text fields, not to
mention different global/regional encodings. I have no control over
the data generated by users, so I would like to be as flexible and
accommodating as possible. The following code is straightforward, but
will fail with embedded nuls in R <= 2.10
fid = open(filename, "rb")
readChar(fid, n=10)
close(fid)
Previous suggestions from the R-help list led me to consider
fid = open(filename, "rb")
rawToChar(readBin(fid, "raw", 10))
close(fid)
or even
fid = open(filename, "rb")
iconv(rawToChar(readBin(fid, "raw", 10)), to="UTF-8")
close(fid)
to ensure that my output is "well behaved". With the new error
handling in rawToChar() in R = 2.11, embedded nuls are no longer
allowed except at the end of the string. I run across these all the
time in my user data. How can I recover as much of the text as
possible when reading in from a binary file with embedded nuls in R >=
2.11 and keep the code backwards compatible with R < 2.11?
thanks...
Brandon
More information about the R-help
mailing list