[R] Read function that detects format automatically

Jeroen Ooms jeroenooms at gmail.com
Thu Apr 28 04:22:17 CEST 2011


I was wondering if there exists a function that automatically tries to detect
the format of a datafile. E.g. if it is an ascii datafile, that it can
detect appropriate defaults for the read.table() parameters. One could for
example read the first 10 lines of the file and analyze the format of the
first line in comparison with the others, count the number of dots, colons
and semicolons, etc. More generally, one could use the file extension or if
available the unix 'file' command to evaluate the filetype if it is non
ascii.

I think it should not be very complicated to get a very high accuracy for
detecting formats. For most datafiles it is for a human statistican easy to
see the format of the file by looking at a fragment, so it should be
possible to capture these rules in some code. It would be nice to have
something like a read.magic() function that reads a datafile using the
appropriate command, regardless of whether the user supplied an csv1, csv2,
tab delimited, excel, spss, stata, etc file. 

I actually started to code something like this, but then I figured that
maybe someone else has had the exact same idea.


--
View this message in context: http://r.789695.n4.nabble.com/Read-function-that-detects-format-automatically-tp3479958p3479958.html
Sent from the R help mailing list archive at Nabble.com.



More information about the R-help mailing list