[R] A slight trap in read.table/read.csv.
Don MacQueen
macq at llnl.gov
Mon Mar 1 00:28:06 CET 2010
There is, however, an important distinction.
Quoting from ?TRUE (or ?logical):
'TRUE' and 'FALSE' are reserved words denoting logical constants
in the R language, whereas 'T' and 'F' are global variables whose
initial values set to these. All four are 'logical(1)' vectors.
> TRUE <- 3
Error in TRUE <- 3 : invalid (do_set) left-hand side to assignment
In other words, the rule is
T is TRUE unless otherwise defined by the user
(ditto for F)
So this rule apparently applies to input from a file. Using
colClasses is then an example of "otherwise defined by the user."
I think it's logical (pun not particularly intended) and consistent
(though perhaps not ideal, but that's another question...)
-Don
At 5:37 PM -0500 2/28/10, Gabor Grothendieck wrote:
>It is strange. Even in R itself T and F are not guaranteed to be TRUE
>and FALSE.
>
>> T <- 1:3
>> T
>[1] 1 2 3
>
>
>On Sun, Feb 28, 2010 at 4:55 PM, Rolf Turner <r.turner at auckland.ac.nz> wrote:
>>
>> I had occasion recently to read in a one-line *.csv file that
>> looked like:
>>
>> "CandidateName","NSN","Ethnicity","dob","gender"
>> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>>
>> That "F" (for female) in the last field got transformed to
>> FALSE. Apparently read.csv (and hence read.table) are inferring
>> that if the entries of a file are all F's and T's then the
>> field is interpreted as logical.
>>
>> If I change the file to
>>
>> "CandidateName","NSN","Ethnicity","dob","gender"
>> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>> "Mingdinkler, Melvin Queue",999888777,"01/04/1942","M"
>>
>> then the read functions correctly interpret the last field
>> as being character.
>>
>> The translation of "F" into FALSE resulted in some mysterious
>> contretemps in further analysis, which it took me a while to
>> track down.
>>
>> I solved the problem by putting in a colClasses argument in my
>> call to read.csv(). But I really think that the read functions
>> are being too clever by half here. If field entries are surrounded
>> by quotes, shouldn't they be left as character? Even if they are
>> all F's and T's?
>>
>> Furthermore using F's and T's to represent TRUE's and FALSE's is
>> bad practice anyway. Since FALSE and TRUE are reserved words it
>> would make sense for the read function to assume that a field is
>> logical if it consists entirely of these words. But T's and F's
>> .... I don't think so.
>>
>> I would argue that this behaviour should be changed. I can see no
>> downside to such a change.
>>
>> cheers,
>>
>> Rolf Turner
>>
>> ######################################################################
>> Attention:\ This e-mail message is privileged and confid...{{dropped:9}}
>>
>> ______________________________________________
>> R-help at r-project.org mailing list
>> https://*stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>>http://*www.*R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>______________________________________________
>R-help at r-project.org mailing list
>https://*stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
--
---------------------------------
Don MacQueen
Lawrence Livermore National Laboratory
Livermore, CA, USA
925-423-1062
macq at llnl.gov
More information about the R-help
mailing list