[R] count.fields vs read.table

Mon Dec 5 09:02:19 CET 2005

On Mon, 5 Dec 2005, Peter Dalgaard wrote:

> "Andrew C. Ward" <acward at tpg.com.au> writes:
>
>> Dear R-help,
>>
>> I am using R 2.1.1 on Windows XP.
>>
>> I have a tab-delimited data file that has been exported by SAS. The file is reasonably big so I
>> apologise that I can't give a good toy example. I do this:
>>       table(count.fields("t1.txt", sep="\t", quote="\""))
>>       248
>>       809
>> So I have 809 lines, each with 248 fields.
>>
>> There's something wrong with me, my data or both, since when I try to read the data, I get this:
>>       dim(read.table("t1.txt", sep="\t", quote="\"", header=TRUE)
>>       [1] 425 248
>>
>> I wonder if someone could be kind enough to point out what I've done wrong or suggest some tips
>> for managing this, please? Thanks for your advice!
>
>
> Something around line 425 that causes the rest of the file to be
> gobbled? Quotes and comment characters could be the culprit, although
> the inconsistency with count.fields looks suspicious. Otherwise, I'd
> look at the data read and try to pinpoint the line where things go
> weird (e.g. the last handful of entries of the first column).

count.fields explicitly says it counts lines, and read.table allows 
embedded newlines in quoted fields.  These days they don't do exactly the 
same thing.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595