[R] A slight trap in read.table/read.csv.
David Winsemius
dwinsemius at comcast.net
Sun Feb 28 23:33:41 CET 2010
On Feb 28, 2010, at 4:55 PM, Rolf Turner wrote:
>
> I had occasion recently to read in a one-line *.csv file that
> looked like:
>
> "CandidateName","NSN","Ethnicity","dob","gender"
> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
>
> That "F" (for female) in the last field got transformed to
> FALSE. Apparently read.csv (and hence read.table) are inferring
> that if the entries of a file are all F's and T's then the
> field is interpreted as logical.
>
> If I change the file to
>
> "CandidateName","NSN","Ethnicity","dob","gender"
> "Smith, Mary Jane",111222333,"E","2/25/1989","F"
> "Mingdinkler, Melvin Queue",999888777,"01/04/1942","M"
>
> then the read functions correctly interpret the last field
> as being character.
>
> The translation of "F" into FALSE resulted in some mysterious
> contretemps in further analysis, which it took me a while to
> track down.
>
> I solved the problem by putting in a colClasses argument in my
> call to read.csv(). But I really think that the read functions
> are being too clever by half here. If field entries are surrounded
> by quotes, shouldn't they be left as character? Even if they are
> all F's and T's?
>
> Furthermore using F's and T's to represent TRUE's and FALSE's is
> bad practice anyway. Since FALSE and TRUE are reserved words it
> would make sense for the read function to assume that a field is
> logical if it consists entirely of these words. But T's and F's
> .... I don't think so.
It is documented that conversion will be attempted to logical, so it
does make sense that T/F would become TRUE and FALSE since that is
typical behavior elsewhere. But at the very least this sentence in the
type.convert help page:
"Given a character vector, it attempts to convert it to logical,
integer, numeric or complex, and failing that converts it to factor
unless as.is = TRUE."
... ought to be clarified. It is not at all clear that the
conversion to logical still will be attempted even if as.is=TRUE, i.e.
the only conversion not attempted would be to factor.
>
> I would argue that this behaviour should be changed. I can see no
> downside to such a change.
>
> cheers,
>
> Rolf Turner
>
> ######################################################################
> Attention:\ This e-mail message is privileged and confid...{{dropped:
> 9}}
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
More information about the R-help
mailing list