[R] read.csv : double quoted numbers
Aval Sarri
aval.sarri at gmail.com
Wed Aug 20 20:27:27 CEST 2008
Hello;
I am new user of R; so pardon me.
I am reading a .txt file that has around 50+ numeric columns with '\t'
as separator. I am using read.csv function along with colClasses but
that fails to recognize double quoted numeric values. (My numeric
values are something like "1,001.23"; "1,008,000.456".) Basically
read.csv fails with - "scan() expected 'a real', got '"1,044.059"'.
What I have tried and problems with them:
1) I tried scan and pipe but getting following error message; that is
how do I replace all double quotes with nothing. I tired enclosing sed
command in single quotes but that does not help.
(Though the sed command works from shell)
scan(pipe("sed -e s/\"//g DataAll.txt"), sep="\t")
sh: Syntax error: Unterminated quoted string
2) On mailing list on solution I found was setAs() described here
http://www.nabble.com/Re%3A--R--read.table()-and-scientific-notation-p6734890.html
3) Other than using as.is=TRUE and then doing as.numeric for numeric
columns what is the solution? But then how do I efficiently convert
50+ columns to numeric using regular expression? That is all my
numeric columns name starts with 'X' character, so how do I use sapply
and/or regular expression to convert all columns starting with X to
numeric? What is the alternate method to do so?
Basically 2 and 3 works but which one is efficient and correct way to do this.
(Also what is most efficient way to apply field level validation and
conversion while reading a file? Does one has to read the file and
only after that validation and conversion can happen?)
Thanks for taking out time to read through the mail.
Thanks and Regards
-Aval
More information about the R-help
mailing list