[R] count.fields inconsistent with read.table?
peter dalgaard
pdalgd at gmail.com
Fri Feb 24 08:41:07 CET 2012
On Feb 24, 2012, at 06:58 , Sam Steingold wrote:
> Hi,
>
> batch is a vector of lines returned by readLines from a
> NL-line-terminated file, here is the relevant section:
> =========================================================
> AA BB CC DD EE FF
> GG H
>
> H JJ KK LL MM
> =========================================================
> as you can see, a line is corrupt; two CRLF's are inserted.
Actually, I don't see... (It's pretty hard to count TAB characters by eye.)
> This is okay, I drop the bad lines, at least I hope I do:
>
> conn <- textConnection(batch)
> field.counts <- count.fields(conn, sep="\t", comment.char="", quote="")
> close(conn)
> good <- field.counts == 8 # this should drop all bad lines
> if (!all(good))
> batch <- batch[good]
> conn <- textConnection(batch)
> ret <- read.table(conn, sep="\t", comment.char="", quote="")
> close(conn)
>
> I get this error in read.table():
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
> line 7151 did not have 8 elements
>
> how come?!
You can do better than this in terms of providing clues for us: "batch" is a character vector, right? So recheck that count.fields returns all 8's after removal of bad lines. Also check that dimensions match -- is length(batch) actually the same as length(field.counts)? Finally, what is in line 7151?
>
> also, is there some error recovery?
Well you can try().
> e.g., the code above is a part of a function - is there a way to recover
> batch (without re-running the whole thing)?
>
> Thanks!
>
> --
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
> http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
> Conscience is like a hamster: it is either asleep or gnawing.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk Priv: PDalgd at gmail.com
More information about the R-help
mailing list