[R] count.fields inconsistent with read.table?

peter dalgaard pdalgd at gmail.com
Fri Feb 24 08:41:07 CET 2012


On Feb 24, 2012, at 06:58 , Sam Steingold wrote:

> Hi,
> 
> batch is a vector of lines returned by readLines from a
> NL-line-terminated file, here is the relevant section:
> =========================================================
> AA	BB	CC	DD			EE	FF
> GG	H
> 
> H	JJ	KK			LL	MM
> =========================================================
> as you can see, a line is corrupt; two CRLF's are inserted.

Actually, I don't see... (It's pretty hard to count TAB characters by eye.)


> This is okay, I drop the bad lines, at least I hope I do:
> 
>  conn <- textConnection(batch)
>  field.counts <- count.fields(conn, sep="\t", comment.char="", quote="")
>  close(conn)
>  good <- field.counts == 8  # this should drop all bad lines
>  if (!all(good))
>    batch <- batch[good]
>  conn <- textConnection(batch)
>  ret <- read.table(conn, sep="\t", comment.char="", quote="")
>  close(conn)
> 
> I get this error in read.table():
> 
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
>  line 7151 did not have 8 elements
> 
> how come?!

You can do better than this in terms of providing clues for us: "batch" is a character vector, right? So recheck that count.fields returns all 8's after removal of bad lines. Also check that dimensions match -- is length(batch) actually the same as length(field.counts)? Finally, what is in line 7151?

> 
> also, is there some error recovery?

Well you can try().

> e.g., the code above is a part of a function - is there a way to recover
> batch (without re-running the whole thing)?
> 
> Thanks!
> 
> -- 
> Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X 11.0.11004000
> http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
> http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
> Conscience is like a hamster: it is either asleep or gnawing.
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd.mes at cbs.dk  Priv: PDalgd at gmail.com



More information about the R-help mailing list