[R] Reading a tab delimted file of varying length using read.table
Rolf Turner
r.turner at auckland.ac.nz
Mon Jan 18 00:01:27 CET 2016
On 18/01/16 10:48, Uwe Ligges wrote:
> This is not a tab delimited file (as you apparently assume given the
> code), but a fixed width format, hence I'd try:
>
> url <- "http://data.princeton.edu/wws509/datasets/divorce.dat"
> widths <- c(9, 13, 10, 8, 10, 6)
> f5 <- read.fwf(url, widths = widths, skip = 1, strip.white = TRUE)
>
> names(f5) <- as.character(unlist(read.fwf(url, widths = widths,
> strip.white=TRUE, n=1)))
>
> Not sure why reading it simply with header=TRUE des not work, but no
> time to investiagte this now.
Dear Uwe,
I have fiddled around a bit and the situation seems to me to be of the
nature of a bug in read.fwf. It would seem that in order for
header=TRUE to work, the entries of the header need to be separated by
the sep delimiter which defaults to "\t". In the case in question the
entries are separated by blanks, so presumably the header gets read in
as a single entity, rather than 6 such, leading to a mismatch between
the length of the header and the number of columns.
It seems that the specified widths get ignored when the header line is
dealt with.
It also seems that if one specifies sep="" then the header gets read
correctly but then strings of blanks get interpreted as field separators
throughout and then blanks within the fields result in the
wrong number of columns.
I think that the code of read.fwf is easy enough to fix; a slight
adjustment will make the header get treated the same way as the body of
the file.
I don't see any problems/drawbacks with so-doing, and experimenting with
my modified function resulted in the divorce data being read in with
header=TRUE with no problems.
If this mod is made, I see no reason to keep the "sep" argument in
read.fwf --- except maybe for backward compatibility issues, and I don't
think there would be any since it never worked properly anyhow.
cheers,
Rolf
P. S. I can send you my modified version of read.fwf off-list if this
would be of any use to you.
R.
--
Technical Editor ANZJS
Department of Statistics
University of Auckland
Phone: +64-9-373-7599 ext. 88276
More information about the R-help
mailing list