[R] read.csv quotes within fields
Tim Howard
tghoward at gw.dec.state.ny.us
Fri Jan 25 20:35:20 CET 2013
Great point, your fix (quote="") works for the example I gave. Unfortunately, these text strings have commas in them as well(!). Throw a few commas in any of the text strings and it breaks again. Sorry about not including those in the example.
So, I need to incorporate commas *and* quotes with the escape character within a single string.
Tim
>>> David Winsemius <dwinsemius at comcast.net> 1/25/2013 2:27 PM >>>
On Jan 25, 2013, at 10:42 AM, Tim Howard wrote:
> All,
>
> I have some csv files I am trying to import. I am finding that quotes inside strings are escaped in a way R doesn't expect for csv files. The problem only seems to rear its ugly head when there are an uneven number of internal quotes. I'll try to recreate the problem:
>
> # set up a matrix, using escape-quote as the internal double quote mark.
>
> x <- data.frame(matrix(data=c("1", "string one", "another string", "2", "quotes escaped 10' 20\" 5' 30\" \"test string", "final string", "3","third row","last \" col"),ncol = 3, byrow=TRUE))
>
>> write.csv(x, "test.csv")
>
> # NOTE that write.csv correctly created the three internal quotes ' " ' by using double quotes ' "" '.
> # here's what got written
>
> "","X1","X2","X3"
> "1","1","string one","another string"
> "2","2","quotes escaped 10' 20"" 5' 30"" ""test string","final string"
> "3","3","third row","last "" col"
>
> # Importing test.csv works fine.
>
>> read.csv("test.csv")
> X X1 X2 X3
> 1 1 1 string one another string
> 2 2 2 quotes escaped 10' 20" 5' 30" "test string final string
> 3 3 3 third row last " col
> # this looks good.
> # now, please go and open "test.csv" with a text editor and replace all the double quotes '""' with the
> # quote escaped ' \" ' as is found in my data set. Like this:
>
> "","X1","X2","X3"
> "1","1","string one","another string"
> "2","2","quotes escaped 10' 20\" 5' 30\" \"test string","final string"
> "3","3","third row","last \" col"
Use quote="":
> read.csv(text='"","X1","X2","X3"
+ "1","1","string one","another string"
+ "2","2","quotes escaped 10\' 20"" 5\' 30"" ""test string","final string"
+ "3","3","third row","last "" col"', sep=",", quote="")
Not ...., quote="\""
X.. X.X1. X.X2. X.X3.
1 "1" "1" "string one" "another string"
2 "2" "2" "quotes escaped 10' 20"" 5' 30"" ""test string" "final string"
3 "3" "3" "third row" "last "" col"
You will then be depending entirely on commas to separate.
(Needed to use escaped single quotes to illustrate from a command line.)
>
> # this breaks read.csv:
>
>> read.csv("test.csv")
> X X1 X2 X3
> 1 1 1 string one another string
> 2 2 2 quotes escaped 10' 20\\ 5' 30\\ \\test ( file://\test ) string,final string\n3,3,third row,last \\ col
>
> # we now have only two rows, with all the data captured in col2 row2
>
> Any suggestions on how to fix this behavior? I've tried fiddling with quote="\"" to no avail, obviously. Interestingly, an even number of escaped quotes within a field is loaded correctly, which certainly threw me for a while!
>
> Thank you in advance,
> Tim
>
>
David Winsemius
Alameda, CA, USA
More information about the R-help
mailing list