[R] Strange csv parsing problem
Peter Ehlers
ehlers at ucalgary.ca
Thu Apr 8 16:30:10 CEST 2010
Hadley,
The cause of the count.fields result is the comma in 'nftc,%20'
at about column 300 (for me).
Since commas between quotes should normally not matter, this
must be due to the comma appearing inside escaped quotes, i.e.
we have: "abc\"def,ghi\"jkl".
Remove the comma and count.fields gives 11 for all rows.
From your other post(s) on escaped quotes, I assume that
this won't solve your problem with the existing files. (:
Try this:
create a text file with the lines
"a,a"
"\"bc\""
"d\"e,f\"g"
count.fields(file, sep = ",").
[1] 1 1 2
-Peter Ehlers
On 2010-04-07 19:26, Hadley Wickham wrote:
>> url<- "http://dl.dropbox.com/u/41902/22240.csv"
>>
>> read.csv(url)[, 1]
> [1] "oppose" NA "oppose" "support"
>> read.csv(url, header = F)[, 1]
> [1] "url"
> [2] "http://maplight.org/us-congress/bill/109-hr-5825/387248"
> [3] "http://maplight.org/us-congress/bill/110-hr-3546/378743"
> [4] "http://maplight.org/us-congress/bill/111-s-908/365504"
> [5] "http://maplight.org/us-congress/bill/111-hr-3245/373358"
>>
>> count.fields(url, sep = ",")
> [1] 11 11 11 12 11
>
> This seems like it should be an error - I suspect it might be caused
> by the escaped quote (\") in line 4 column 432 causing the first
> column to be treated as column names:
>
>> read.csv(url, row.names = NULL)[, 1]
> [1] "http://maplight.org/us-congress/bill/109-hr-5825/387248"
> [2] "http://maplight.org/us-congress/bill/110-hr-3546/378743"
> [3] "http://maplight.org/us-congress/bill/111-s-908/365504"
> [4] "http://maplight.org/us-congress/bill/111-hr-3245/373358"
>
> Hadley
>
--
Peter Ehlers
University of Calgary
More information about the R-help
mailing list