[R] retaining characters in a csv file
Rolf Turner
r.turner at auckland.ac.nz
Wed Sep 23 00:33:13 CEST 2015
On 23/09/15 10:00, Therneau, Terry M., Ph.D. wrote:
> I have a csv file from an automatic process (so this will happen
> thousands of times), for which the first row is a vector of variable
> names and the second row often starts something like this:
>
> 5724550,"000202075214",2005.02.17,2005.02.17,"F", .....
>
> Notice the second variable which is
> a character string (note the quotation marks)
> a sequence of numeric digits
> leading zeros are significant
>
> The read.csv function insists on turning this into a numeric. Is there
> any simple set of options that
> will turn this behavior off? I'm looking for a way to tell it to "obey
> the bloody quotes" -- I still want the first, third, etc columns to
> become numeric. There can be more than one variable like this, and not
> always in the second position.
>
> This happens deep inside the httr library; there is an easy way for me
> to add more options to the read.csv call but it is not so easy to
> replace it with something else.
IMHO this is a bug in read.csv().
A possible workaround:
ccc <- c("integer","character",rep(NA,k))
X <- read.csv("melvin.csv",colClasses=ccc)
where "melvin.csv" is the file from which you are attempting to read and
where k+2 = the number of columns in that file.
Kludgey, but it might work.
Another workaround is to specify quote="", but this has the side effect
of making the 5th column character rather than logical.
cheers,
Rolf
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
More information about the R-help
mailing list