[R] The behaviour of read.csv().

Duncan Murdoch murdoch.duncan at gmail.com
Sun Dec 5 15:00:25 CET 2010


On 03/12/2010 7:08 AM, Duncan Murdoch wrote:
> On 02/12/2010 9:59 PM, Rolf Turner wrote:
>>
>> On 3/12/2010, at 3:48 PM, David Scott wrote:
>>
>>>    On 03/12/10 14:33, Duncan Murdoch wrote:
>>
>> 	<SNIP>
>>
>>>> I think the fill=TRUE option arrived about 10 years ago, in R 1.2.0.
>>>> The comment in the NEWS file suggests it was in response to some strange
>>>> csv file coming out of Excel.
>>>>
>>>> The real problem with the CSV format is that there really isn't a well
>>>> defined standard for it.  The first RFC about it was published in 2005,
>>>> and it doesn't claim to be authoritative.  Excel is kind of a standard,
>>>> but it does some very weird things.  (For example:  enter the string 01
>>>> into a field.  To keep the leading 0, you need to type it as '01.  Save
>>>> the file, read it back:  goodbye 0.  At least that's what a website I
>>>> was just on says about Excel, and what OpenOffice does.)
>>>>
>>>> I've been burned so many times by storing data in .csv files, that I
>>>> just avoid them whenever I can.
>>> Absolutely agree with this Duncan. Playing around with .csv files is
>>> like playing with some sort of unstable explosive. I also avoid them as
>>> much as possible.
>>
>> Where I work, everybody but me uses (yeuuccchhh!!!) Excel or SPSS.  If
>> we are to share data sets, *.csv files seem to be the most efficacious,
>> if not the only, way to go.
>
> I was going to suggest using DIF rather than CSV.  It contains more
> internal information about the file (including the type of each entry),
> but has the disadvantage of being less readable, even though it is ascii.
>
> However, in putting together a little demo, I found a couple of bugs in
> the R implementation of read.DIF, and it looks as though it ignores the
> internal type information.  Sigh.

As of r53778, the bugs I noticed should be fixed.  read.DIF now respects 
the internal type information, so it will keep character strings like 
"001" as type character (unless you ask it to change the type).

Duncan Murdoch

>
> Duncan Murdoch
>
>
>>
>> So far, we've had very few problems.  The one that started off this thread
>> is the only one I can think of that related to the *.csv format.
>>
>> At least *.csv files have the virtue of being ASCII files, whence if things
>> go wrong it is at least possible to dig into them with a text editor and
>> figure out just what the problem is.
>>
>> 	cheers,
>>
>> 		Rolf
>



More information about the R-help mailing list